<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R-bloggers</title>
	<atom:link href="https://www.r-bloggers.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.r-bloggers.com</link>
	<description>R news and tutorials contributed by hundreds of R bloggers</description>
	<lastBuildDate>Fri, 03 Apr 2026 00:00:00 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.5.18</generator>

<image>
	<url>https://i0.wp.com/www.r-bloggers.com/wp-content/uploads/2016/08/cropped-R_single_01-200.png?fit=32%2C32&#038;ssl=1</url>
	<title>R-bloggers</title>
	<link>https://www.r-bloggers.com</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">11524731</site>	<item>
		<title>You can just build your own programming language</title>
		<link>https://www.r-bloggers.com/2026/04/you-can-just-build-your-own-programming-language/</link>
		
		<dc:creator><![CDATA[Econometrics and Free Software]]></dc:creator>
		<pubDate>Fri, 03 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://b-rodrigues.github.io/posts/2026-04-03-tproject.html</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Last summer, while relaxing on the beaches of Berck, a French town known for treating tuberculosis in kids by exposing them to the fresh maritime air (back in 19th century, they have antibiotics these days), I found myself daydreaming abou...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/you-can-just-build-your-own-programming-language/">You can just build your own programming language</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://b-rodrigues.github.io/posts/2026-04-03-tproject.html"> Econometrics and Free Software</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 




<div style="text-align: center;">
<p>
<a> <img src="https://i0.wp.com/b-rodrigues.github.io/assets/img/tlogo.png?w=578&#038;ssl=1" style="width: 50%; height: auto;" data-recalc-dims="1"> </a>
</p>
</div>
<p>Last summer, while relaxing on the beaches of Berck, a French town known for treating tuberculosis in kids by exposing them to the fresh maritime air (back in 19th century, they have antibiotics these days), I found myself daydreaming about building my own programming language.</p>
<p>Spoiler alert: I don’t know how to build programming languages, but I have developed extremely strong opinions over the years about the features a modern data science language <em>should</em> have. So could I use them fancy LLMs to build one?</p>
<p>Also, let’s get one question answered straight away: why create a new language instead of contributing to existing ones? I certainly do contribute, I maintain several R packages like <code>{rix}</code>, <code>{rixpress}</code>, and <code>{chronicler}</code>, and even have two Python packages (<code>cronista</code> and <code>ryxpress</code>), but I wanted a clean slate to build a system centered around a few non-negotiable principles and features I’ve implemented over the years in R:</p>
<ul>
<li><strong>Reproducibility-First</strong>: A language where reproducibility isn’t a bolt-on afterthought managed by external tools, but the very foundation of the runtime.</li>
<li><strong>Aggressive Re-use</strong>: Instead of reinventing the wheel, this language would stand on the shoulders of giants. It’d use <strong>Nix</strong> for package management and environment isolation, and <strong>Apache Arrow</strong> as its high-performance backbone for data frames. R, Python, Julia and other languages would provide the algorithms and models.</li>
<li><strong>First-Class Pipelines</strong>: Scripts shouldn’t be a sequence of side-effects. In this language, pipelines would be mandatory and first-class citizens.</li>
<li><strong>Fail Early and Loudly</strong>: No silent type conversions or hidden NAs. If something is wrong, the language breaks immediately so you can fix it.</li>
<li><strong>Errors as Objects</strong>: Inspired by functional programming, errors are first-class values that can be inspected and handled gracefully.</li>
<li><strong>Two Pipes</strong>: I want two pipes, one for linear transformations, <code>|&gt;</code>, and a maybe-pipe, <code>?|&gt;</code> for error recovery. Unlike the standard pipe, <code>?|&gt;</code> always forwards its value, including Errors, to the next function, allowing you to write handlers that inspect and potentially recover from them. Since Errors are just values, this composes naturally with the rest of the language.</li>
<li><strong>Polyglot by Design</strong>: Rather than re-implementing every statistical algorithm, this language would be designed to orchestrate and bridge R, Python, and Julia seamlessly.</li>
</ul>
<p>Also, we’re in a post LLM world, and like them or not, they’re here to stay. They’re pretty useful to write boilerplate code and so any new language would be dead on arrival if it didn’t play nicely with LLMs. So such a new language would need to be written for LLMs primarily, because I don’t expect anyone to learn any new language. This is where the declarative nature of Nix is a huge advantage. Because environments are precisely described, it is much easier for LLMs to focus on generating code and not have to fight with environment setup. This is also the reason I took another radical decision: since Nix would be mandatory for setting up the environment, why bother building OS-specific binaries? I’d just build a Nix package for this language and let Nix handle the rest.</p>
<p>This architecture results in a DSL for orchestration, making it trivial to transfer data objects between different ecosystems without the usual FFI (Foreign Function Interface) friction.</p>
<p>With these ideas in mind, I started prompting Gemini to brainstorm and started by generating specification files. Very broad first, but as days went by, more and more focused. The way I went about it (and still go) is that I first brainstorm an idea with an LLM, then I ask it to generate a specification file, then I refine it, ask it to generate a new specification file, and so on. Once I’m happy with the spec, I ask an LLM to generate a minimal implementation of the spec. Usually writing the spec and a first implementation is a task shared between Claude and Gemini (through Antigravity). Then I open a pull request and ask GitHub Copilot to review it (usually with GPT-5.x). I repeat this process until I’m happy with the implementation. I always ask for documentation and unit tests (and golden tests when relevant, more on this later).</p>
<p>I started to really believe that I had something interesting, so I gave it a shot, and called it <strong>T</strong>. I had long joked that the natural successor to R should be called T (because R is the successor to S… and no, I’m not going to call it Q because that sounds like the word for ass in French).</p>
<p>Something else that made me confident I could succeed, besides my own hubris, was that I am pretty familiar with unit testing, test-driven development, trunk-based development and Nix. When you combine all these elements, it makes developing with LLMs quite safe.</p>
<p>So I just started prompting. And now I’m quite happy to announce that there is a beta version of T that you can use today!</p>
<p>By leveraging Nix as a build engine, T can treat complex data science workflows as buildable derivations. A typical T pipeline looks like this:</p>
<pre>p = pipeline {
  -- 1. Python node: read data with pandas
  mtcars_pl = pyn(
    command = &lt;{
import pandas as pd
pd.read_csv(&quot;data/mtcars.csv&quot;, sep=&quot;|&quot;)
    }&gt;,
    include = [&quot;data/mtcars.csv&quot;],
    serializer = ^csv
  )

  -- 2. Python node: filter and serialize as CSV
  mtcars_pl_am = pyn(
    command = &lt;{
mtcars_pl[mtcars_pl['am'] == 1]
    }&gt;,
    deserializer = ^csv,
    serializer = ^csv
  )

  -- 3. R node: read CSV and take head using functions.R
  mtcars_head = rn(
    command = &lt;{
my_head(mtcars_pl_am)
    }&gt;,
    functions = [&quot;src/functions.R&quot;],
    deserializer = ^csv,
    serializer = ^csv
  )

  -- 4. R node: select column with dplyr
  mtcars_mpg = rn(
    command = &lt;{
library(dplyr)
mtcars_head %&gt;% select(mpg)
    }&gt;,
    deserializer = ^csv,
    serializer = ^csv
  )

  -- Render Quarto report
  report = node(script = &quot;src/report.qmd&quot;, runtime = Quarto)
}

-- Materialize the pipeline
populate_pipeline(p, build = true)
pipeline_copy() -- Copy the outputs from the Nix store to your working directory</pre>
<p>As you can see, each node has a <code>command</code> argument where you can write literal R or Python code. It is also possible to provide the path to a script instead. If packages need to be loaded for the code to work, you can just write the calls to load the required packages in the <code>command</code> argument as well.</p>
<p>While T is heavily inspired by the <code>{targets}</code> package in R, it takes the concept a step further by making pipelines <strong>first-class objects</strong> within the language itself. This means you can:</p>
<ul>
<li><strong>Compose Pipelines</strong>: You can define small, modular pipelines and then merge them into larger ones using standard operators.</li>
<li><strong>Static Analysis</strong>: Because the DAG (Directed Acyclic Graph) is defined within the language, T can validate your entire workflow (checking for circular dependencies or missing data) before a single line of code even runs.</li>
<li><strong>Heterogeneous Execution</strong>: A single pipeline can effortlessly mix R, Python, and native T code. Data is passed between these nodes using built-in serializers like <code>^csv</code>, <code>^arrow</code>, or even specialized formats like <code>^pmml</code> for traditional models and <code>^onnx</code> for deep learning architectures. It is also possible to define your own serializers.</li>
<li><strong>Immutable State</strong>: Each node output is managed by Nix, meaning if you haven’t changed the code or the data for a specific node, T (via Nix) will simply pull the cached result from previous runs.</li>
</ul>
<p>But don’t let the “orchestrator” label fool you; T is also a capable language in its own right. It features a selection of built-in packages inspired by the <code>tidyverse</code> for data manipulation. Thanks to its Arrow backend, it is surprisingly fast. I even maintain a CI benchmark running on NYC Taxi data to ensure performance remains competitive.</p>
<p>I made sure that T is pretty easy to use with LLMs by providing a file called <code>summary.md</code> in the root of the GitHub repository. This file is meant to be used by LLMs to quickly learn the language’s syntax and generate code accordingly. You could also provide the whole help documentation to the LLM (found in the repository under <code>help/docs.json</code>), but I found that a summary is usually enough. There is also another experimental feature I’m thinking about, called <code>intent</code> blocks. These blocks would essentially be first-class structured comments that would be used to anchor LLM’s behaviour and make it more deterministic. These blocks would be parsed by T and used to generate code accordingly. I have some ideas how these could look like, something like this:</p>
<pre>intent {
  description: &quot;Customer churn prediction&quot;,
  assumptions: [&quot;Age &gt; 18&quot;, &quot;NA imputed with multiple imputation&quot;],
  requires: [&quot;dataset.csv&quot;]
}</pre>
<section id="is-this-slop" class="level2">
<h2 class="anchored" data-anchor-id="is-this-slop">Is this slop?</h2>
<p>There’s a lot of skepticism about building your own language using LLMs, and I get it. I was pretty skeptical myself. So let me tell you what actually gives me confidence in T’s correctness: as of writing, 1753 unit tests, 122 golden tests, 13 end-to-end tests, and 18 full project demos are executed on every push and PR, on both Linux and macOS via GitHub Actions. That’s the verification regime, and it has to be rigorous precisely because I can’t audit the OCaml implementation by eye. This is actually one of the more interesting lessons from this project: when you can’t rely on code review, you have to over-invest in tests and specifications. The spec files, the enriched changelog, the <code>summary.md</code>, all of that context makes the LLM’s output more predictable, and the test suite tells you immediately when it isn’t.</p>
<p>From personal experience, when I generate R or Python code, the output looks a lot like what I would have written myself. The main failure mode I’ve noticed is lack of context: the more you give the model, the better the result. Letting separate LLMs review PRs and iterating through several loops helps catch what any single model misses.</p>
<p>I’m also confident in T’s safety from a different angle: it’s ultimately orchestrating Python and R code you write yourself, and that you can test independently.</p>
</section>
<section id="interested" class="level2">
<h2 class="anchored" data-anchor-id="interested">Interested?</h2>
<p>If you’re interested in trying it out or contributing, check out the <a href="https://github.com/b-rodrigues/tlang" rel="nofollow" target="_blank">official repository</a> or the <a href="https://tstats-project.org/" rel="nofollow" target="_blank">website</a>, and don’t hesitate to open an issue or a PR or contact me on the dedicated Matrix (https://matrix.to/#/#tproject:matrix.org) channel.</p>
</section>
<section id="appendix" class="level1">
<h1>Appendix</h1>
<p>For the interested reader, here’s how to get started with T.</p>
<section id="how-to-get-started" class="level2">
<h2 class="anchored" data-anchor-id="how-to-get-started">How to get started</h2>
<p>If you have Nix installed, getting started with a new project is just a single command away:</p>
<pre># 1. Initialize a new project
nix run github:b-rodrigues/tlang -- init --project my_t_project
cd my_t_project</pre>
<p>There will be no other way to start a T project. As explained above, I don’t want to have to deal with providing OS-specific binaries, and since Nix is used by T as the build engine, you’ll need to have Nix installed on your system anyways. Might as well reuse it to manage the install T itself!</p>
<p>Inside the project’s folder, you’ll find a <code>tproject.toml</code> file. This is were you list R and Python packages you’ll need. For example:</p>
<pre>[project]
name = &quot;r_py_xgboost_t&quot;
description = &quot;A T data analysis project&quot;

[dependencies]
# T packages this project depends on
# Format: package = { git = &quot;repository-url&quot;, tag = &quot;version&quot; }
# Example:
# stats = { git = &quot;https://github.com/t-lang/stats&quot;, tag = &quot;v0.5.0&quot; }

[r-dependencies]
packages = [&quot;dplyr&quot;, &quot;yardstick&quot;]

[py-dependencies]
version = &quot;python313&quot;
packages = [&quot;numpy&quot;, &quot;pandas&quot;, &quot;scikit-learn&quot;, &quot;xgboost&quot;]

[additional-tools]
packages = [&quot;quarto&quot;]

[t]
# Minimum T language version required
min_version = &quot;0.51.2&quot;</pre>
<p>Under “additional tools” you can add any package that is available in <code>nixpkgs</code>. If you need LaTeX, you can also add this dedicated section:</p>
<pre>\(\)
packages = [&quot;amsmath&quot;, &quot;geometry&quot;, &quot;hyperref&quot;, &quot;biblatex&quot;]</pre>
<p>You may have noticed that there is also a section for T packages; that’s right, T supports user-defined packages. Instead of starting a project you’d start a package:</p>
<pre>nix run github:b-rodrigues/tlang -- init --package my_package
cd my_package</pre>
<p>Instead of a <code>tproject.toml</code> file, you’ll have to fill a <code>DESCRIPTION.toml</code> file:</p>
<pre>[package]
name = &quot;my_package&quot;
version = &quot;0.1.0&quot;
description = &quot;A brief description of what my_package does&quot;
authors = [&quot;brodriguesco&quot;]
license = &quot;EUPL-1.2&quot;
homepage = &quot;&quot;
repository = &quot;&quot;

[dependencies]
# T packages this package depends on
# Format: package = { git = &quot;repository-url&quot;, tag = &quot;version&quot; }

[t]
# Minimum T language version required
min_version = &quot;0.5.0&quot;</pre>
<p>Another important file is the <code>flake.nix</code> that will be automatically generated. You shouldn’t have to touch it, but this <code>flake.nix</code> is what provides the reproducible development environment for running your project. To do so, simply use:</p>
<pre>nix develop</pre>
<p>This will install T and activate the environment. If you’ve added stuff to the <code>tproject.toml</code> you’ll have to run <code>t update</code> to sync the packages to the flake, and then rebuild the environment (you’ll need to exit the development environment with <code>exit</code> and rebuild it using <code>nix develop</code> again). Oh and by the way, T requires a Linux-like environment so if you’re on Windows, you’ll have to run T within <strong>WSL2</strong> (Windows Subsystem for Linux).</p>
<p>Once inside the <code>nix develop</code> shell, everything you need, the T interpreter, your specific versions of R/Python, and all project tools, is ready to use. You don’t need to manage virtual environments or Docker containers manually; T handles the heavy lifting via Nix under the hood.</p>
<p>You can browse examples on this <a href="https://github.com/b-rodrigues/t_demos" rel="nofollow" target="_blank">repository</a>.</p>
</section>
<section id="tooling-and-editor-support" class="level2">
<h2 class="anchored" data-anchor-id="tooling-and-editor-support">Tooling and Editor Support</h2>
<p>A language is only as good as its developer experience. I politely asked LLMs to implement a full Language Server (<strong>LSP</strong>) for T, which provides autocompletion, real-time diagnostics, and “Go to Definition” support.</p>
<ul>
<li>For <strong>VS Code / Positron</strong>: A dedicated extension providing syntax highlighting and LSP integration.</li>
<li>For <strong>Vim / Emacs</strong>: Detailed configuration guides and syntax files are available.</li>
<li>For <strong>Quarto</strong>: T is fully compatible with Quarto for literate programming, allowing you to run executable <code>{t}</code> chunks directly in your documents.</li>
</ul>
<p>For detailed setup instructions, check out the <a href="https://github.com/b-rodrigues/tlang/blob/main/docs/editors.md" rel="nofollow" target="_blank">Editor Support guide</a> in the official documentation.</p>
<p>There’s much more I haven’t covered here, so <a href="https://github.com/b-rodrigues/tlang" rel="nofollow" target="_blank">check out the official repository</a> or the <a href="https://tstats-project.org/" rel="nofollow" target="_blank">website</a>.</p>


</section>
</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://b-rodrigues.github.io/posts/2026-04-03-tproject.html"> Econometrics and Free Software</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/you-can-just-build-your-own-programming-language/">You can just build your own programming language</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400328</post-id>	</item>
		<item>
		<title>AI agents can create convincing ecological models, but you still need to know what you’re doing</title>
		<link>https://www.r-bloggers.com/2026/04/ai-agents-can-create-convincing-ecological-models-but-you-still-need-to-know-what-youre-doing/</link>
		
		<dc:creator><![CDATA[Seascapemodels]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 13:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.seascapemodels.org/posts/2026-03-28-agentic-AI-ecological-modelling/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Agentic AI tools like Claude Code can write and run code, fix its own errors, and produce a formatted report with figures. I wanted to know whether that translates into reliable ecological modelling, so we ran a test: three fisheries tasks, four...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/ai-agents-can-create-convincing-ecological-models-but-you-still-need-to-know-what-youre-doing/">AI agents can create convincing ecological models, but you still need to know what you’re doing</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.seascapemodels.org/posts/2026-03-28-agentic-AI-ecological-modelling/"> Seascapemodels</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p>Agentic AI tools like Claude Code can write and run code, fix its own errors, and produce a formatted report with figures. I wanted to know whether that translates into reliable ecological modelling, so we ran a test: three fisheries tasks, four AI models, ten independent runs each, scored against a rubric. The results are published in <a href="https://doi.org/10.1111/faf.70079" rel="nofollow" target="_blank">Fish and Fisheries</a>.</p>
<p>We found agents can be genuinely useful, but only if you know how to use them well and only if you know enough about the analysis to catch what they miss.</p>
<section id="how-we-did-our-tests" class="level2">
<h2 class="anchored" data-anchor-id="how-we-did-our-tests">How we did our tests</h2>
<p>We used <a href="https://roo.cline.bot/" rel="nofollow" target="_blank">Roo Code</a>, an agentic AI that runs inside VS Code. Unlike a chatbot, it can write code, execute it, read error messages, and iterate autonomously. There are many popular software’s for agentic AI, Claude Code is the most popular right now. We chose Roo Code because it is open source and fully customisable.</p>
<p>We gave it detailed specification sheets and asked it to complete three tasks. One was a common ecological modelling task: fitting a generalized linear model (GLM) of fish abundance and coral habitat. The other two were tasks specialised to fisheries modelling: fitting a von Bertalanffy growth curve and running a yield per recruit analysis. We chose these because they are common in ecological sciences, but specalised enough that LLMs probably haven’t seen many examples in their training data.</p>
<p>We ran each task 10 times. LLM responses have some randomness, and this multiplies when doing long-running tasks. So consistency is as important to measure as their best performance. We scored every output against a rubric covering accuracy, code quality, and report quality.</p>
<p>We used four versions of LLMs. Two proprietary models: Claude Sonnet 4.0, Sonnet 4.5 (which came out during review so we added later). One open weight model: Kimi K2 and its ‘exacto’ variant.</p>
<p>During review, Kimi K2 ‘exacto’ became available on the <a href="https://openrouter.ai/" rel="nofollow" target="_blank">OpenRouter</a> platform, so we added that. The exacto routes requests to providers with the best performance. Some providers run it cheaply. Long story-short, exacto performed much better than just requesting any provider’s version of K2, this highlights the importance of running open weight models on quality hardware.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i1.wp.com/www.seascapemodels.org/posts/2026-03-28-agentic-AI-ecological-modelling/AI-agent-workflows.png?w=578&#038;ssl=1" class="img-fluid figure-img" data-recalc-dims="1"></p>
<figcaption>Agentic workflows vs AI assisted coding</figcaption>
</figure>
</div>
</section>
<section id="how-to-use-agentic-ai-for-ecological-modelling" class="level2">
<h2 class="anchored" data-anchor-id="how-to-use-agentic-ai-for-ecological-modelling">How to use agentic AI for ecological modelling</h2>
<p>We learned several key lessons about how to get the best out of agentic AI for ecological modelling.</p>
<p><strong>Write a detailed specification sheet.</strong> Our sheets ran to multiple pages covering analysis aims, data structure, recommended R functions and packages, expected outputs, and file naming conventions. This takes time, but writing a specification forces you to think carefully about what you actually want. <a href="https://github.com/cbrown5/agentic-ai-fisheries/blob/main/Scripts/glm-test-case/glm-readme.md" rel="nofollow" target="_blank">Here’s an example</a>.</p>
<p><strong>Specify the algorithms explicitly.</strong> Agents default to the most common method in their training data, which may not be appropriate for your question. If you want bootstrapped confidence intervals via the <code>boot</code> package, say so.</p>
<p>Even then, they may not comply: both Claude models in our study repeatedly applied natural mortality to the first age class in the yield per recruit model despite explicit instructions not to. That’s a subtle error that affected catch estimates—the numbers that would inform fishery management. These quirks of agent behaviour highlight why expert supervision is essential.</p>
<p><strong>Run replicates and compare outputs.</strong> Accuracy scores varied substantially between runs. sometimes the agent nailed every parameter; sometimes it got some parts correct but made systematic errors in other parts of the analysis. Running multiple agents and comparing outputs is one way to identify the best solutions.</p>
<p><strong>Check the things the agent doesn’t know to check.</strong> None of our agents checked for collinearity between predictors in the GLM, even though it’s standard practice. We deliberately left it out of the specification to see if they’d do that. The GLMs ran fine, the results looked coherent, but there was in fact strong colinearity between the predictors. The lesson here is that the agents are good at coding, but their conceptual implementation may be misleading, incomplete or logically flawed.</p>
</section>
<section id="the-biggest-problem-with-agentic-ai-is-that-it-can-produce-professionally-formatted-output-that-contains-logical-errors" class="level2">
<h2 class="anchored" data-anchor-id="the-biggest-problem-with-agentic-ai-is-that-it-can-produce-professionally-formatted-output-that-contains-logical-errors">The biggest problem with agentic AI is that it can produce professionally formatted output that contains logical errors</h2>
<p>The error type that concerns me most is professionally formatted output containing logical errors.</p>
<p>In our results we saw growth curves that plotted beautifully but used the wrong confidence interval method, or a yield analysis that applies mortality in the wrong sequence. A coding syntax error is immediately obvious. A methodological shortcut embedded in otherwise clean output may be invisible unless you already know what the answer should look like.</p>
<p>There is a genuine risk that inexperienced researchers will use these tools to produce analyses they cannot evaluate. Experienced researchers may also get overconfident and not check results thoroughly enough. These flaws can then leak through to the applications, as we’ve seen where human errors in <a href="https://pnas.org/doi/10.1073/pnas.2426166122" rel="nofollow" target="_blank">ecological modelling impacts decisions on invasive species</a>.</p>
<p>For scientists with strong quantitative foundations, agents offer a real efficiency gain. The specification sheets and rubrics from our study are in the supplemental materials if you want to adapt them. All our code is available on github if you want to run your own tests (<a href="https://github.com/cbrown5/agentic-ai-fisheries/tree/main/Scripts" rel="nofollow" target="_blank">Check this folder, each modelling ‘test-case’ has the specification sheet and other files</a>)</p>
<p>The paper is open access: <a href="https://doi.org/10.1111/faf.70079" rel="nofollow" target="_blank">Brown et al. 2026, Fish and Fisheries</a>.</p>


</section>

 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.seascapemodels.org/posts/2026-03-28-agentic-AI-ecological-modelling/"> Seascapemodels</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/ai-agents-can-create-convincing-ecological-models-but-you-still-need-to-know-what-youre-doing/">AI agents can create convincing ecological models, but you still need to know what you’re doing</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400276</post-id>	</item>
		<item>
		<title>A Better R Programming Experience Thanks to Tree-sitter</title>
		<link>https://www.r-bloggers.com/2026/04/a-better-r-programming-experience-thanks-to-tree-sitter/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Thu, 02 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/04/02/tree-sitter-overview/</guid>

					<description><![CDATA[<p>A little bit less than two years ago, building on work by Jim Hester and Kevin Ushey, Davis Vaughan completed a very impactful JavaScript file for the R community: an R grammar for the Tree-sitter parsing generator. He even got a round of applause for...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/a-better-r-programming-experience-thanks-to-tree-sitter/">A Better R Programming Experience Thanks to Tree-sitter</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>A little bit less than two years ago, building on work by Jim Hester and Kevin Ushey, Davis Vaughan completed a very impactful JavaScript file for the R community: an R grammar for the Tree-sitter parsing generator. He even got a round of applause for it during a talk at the useR! 2024 conference! So, did he get cheered for… grammatical rules in a <a href="https://github.com/r-lib/tree-sitter-r/blob/next/grammar.js" rel="nofollow" target="_blank">JavaScript file</a>? <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f605.png" alt="😅" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p>No, the audience was excited about the <em>improved developer experience for R</em> that this file unlocked. R tooling around Tree-sitter is how you get</p>
<ul>
<li>reformatting through <a href="https://posit-dev.github.io/air/" rel="nofollow" target="_blank">Air</a> and linting through <a href="https://jarl.etiennebacher.com/" rel="nofollow" target="_blank">Jarl</a>;</li>
<li>auto-completion or help on hover in the <a href="https://lionel-.github.io/slidedecks/2024-07-11-ark" rel="nofollow" target="_blank">Positron IDE</a>;</li>
<li>better <a href="https://github.com/orgs/community/discussions/120397" rel="nofollow" target="_blank">search</a> for R on GitHub;</li>
<li>and more!</li>
</ul>
<p>In this post, we’ll explain what Tree-sitter is, and how tools built on Tree-sitter can benefit your R development workflow.</p>
<h2>
Code parsing: what is Tree-sitter?
</h2><p><a href="https://tree-sitter.github.io/tree-sitter/" rel="nofollow" target="_blank">Tree-sitter</a> is a code parsing generator written in C, with bindings existing in several languages including Rust (and R!).</p>
<p>Let’s rewind a little bit. What does it mean to parse code?</p>
<p>Basically, given a string of code like</p>
<pre>a &lt;- mean(x, na.rm = TRUE)
</pre><p>How do you know that <code>mean</code> is a function name, <code>na.rm</code> an argument name, <code>TRUE</code> a logical? You have to <em>parse</em> that code into what’s called a parse tree. You do that in your head when reading R code. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f638.png" alt="😸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<p>R itself can obviously parse R code, thanks to its <a href="https://github.com/wch/r-source/blob/trunk/src/main/gram.y" rel="nofollow" target="_blank">grammar</a>. See for instance the <a href="https://github.com/wch/r-source/commit/a1425adea54bcc98eef86081522b5dbb3e149cdc#diff-ba804d7fa3fa053c1f57d46369f4432cb55c9c4f69f46ae6510d0d1fcc59f382" rel="nofollow" target="_blank">commit that introduced R’s native pipe</a>, which necessitated extending R’s syntax thus modifying its grammar.</p>
<p>You can use <a href="https://rdrr.io/r/base/parse.html" rel="nofollow" target="_blank"><code>parse()</code></a> and <a href="https://rdrr.io/r/utils/getParseData.html" rel="nofollow" target="_blank"><code>getParseData()</code></a> to parse R code.</p>
<div class="highlight">
<pre>parse(
 text = &quot;a &lt;- mean(x, na.rm = TRUE)&quot;,
 keep.source = TRUE
) |&gt;
 getParseData()
#&gt; line1 col1 line2 col2 id parent token terminal text
#&gt; 23 1 1 1 26 23 0 expr FALSE 
#&gt; 1 1 1 1 1 1 3 SYMBOL TRUE a
#&gt; 3 1 1 1 1 3 23 expr FALSE 
#&gt; 2 1 3 1 4 2 23 LEFT_ASSIGN TRUE &lt;-
#&gt; 21 1 6 1 26 21 23 expr FALSE 
#&gt; 4 1 6 1 9 4 6 SYMBOL_FUNCTION_CALL TRUE mean
#&gt; 6 1 6 1 9 6 21 expr FALSE 
#&gt; 5 1 10 1 10 5 21 '(' TRUE (
#&gt; 7 1 11 1 11 7 9 SYMBOL TRUE x
#&gt; 9 1 11 1 11 9 21 expr FALSE 
#&gt; 8 1 12 1 12 8 21 ',' TRUE ,
#&gt; 13 1 14 1 18 13 21 SYMBOL_SUB TRUE na.rm
#&gt; 14 1 20 1 20 14 21 EQ_SUB TRUE =
#&gt; 15 1 22 1 25 15 16 NUM_CONST TRUE TRUE
#&gt; 16 1 22 1 25 16 21 expr FALSE 
#&gt; 17 1 26 1 26 17 21 ')' TRUE )
</pre>
</div>
<p>Or you could transform that same data into XML using Gábor Csárdi’s <a href="https://r-lib.github.io/xmlparsedata/" rel="nofollow" target="_blank">{xmlparsedata}</a>:</p>
<div class="highlight">
<pre>parse(
 text = &quot;a &lt;- mean(x, na.rm = TRUE)&quot;,
 keep.source = TRUE
) |&gt;
 xmlparsedata::xml_parse_data(pretty = TRUE) |&gt;
 xml2::read_xml() |&gt;
 as.character() |&gt;
 cat()
#&gt; &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot; standalone=&quot;yes&quot;?&gt;
#&gt; &lt;exprlist&gt;
#&gt; &lt;expr line1=&quot;1&quot; col1=&quot;1&quot; line2=&quot;1&quot; col2=&quot;26&quot; start=&quot;28&quot; end=&quot;53&quot;&gt;
#&gt; &lt;expr line1=&quot;1&quot; col1=&quot;1&quot; line2=&quot;1&quot; col2=&quot;1&quot; start=&quot;28&quot; end=&quot;28&quot;&gt;
#&gt; &lt;SYMBOL line1=&quot;1&quot; col1=&quot;1&quot; line2=&quot;1&quot; col2=&quot;1&quot; start=&quot;28&quot; end=&quot;28&quot;&gt;a&lt;/SYMBOL&gt;
#&gt; &lt;/expr&gt;
#&gt; &lt;LEFT_ASSIGN line1=&quot;1&quot; col1=&quot;3&quot; line2=&quot;1&quot; col2=&quot;4&quot; start=&quot;30&quot; end=&quot;31&quot;&gt;&lt;-&lt;/LEFT_ASSIGN&gt;
#&gt; &lt;expr line1=&quot;1&quot; col1=&quot;6&quot; line2=&quot;1&quot; col2=&quot;26&quot; start=&quot;33&quot; end=&quot;53&quot;&gt;
#&gt; &lt;expr line1=&quot;1&quot; col1=&quot;6&quot; line2=&quot;1&quot; col2=&quot;9&quot; start=&quot;33&quot; end=&quot;36&quot;&gt;
#&gt; &lt;SYMBOL_FUNCTION_CALL line1=&quot;1&quot; col1=&quot;6&quot; line2=&quot;1&quot; col2=&quot;9&quot; start=&quot;33&quot; end=&quot;36&quot;&gt;mean&lt;/SYMBOL_FUNCTION_CALL&gt;
#&gt; &lt;/expr&gt;
#&gt; &lt;OP-LEFT-PAREN line1=&quot;1&quot; col1=&quot;10&quot; line2=&quot;1&quot; col2=&quot;10&quot; start=&quot;37&quot; end=&quot;37&quot;&gt;(&lt;/OP-LEFT-PAREN&gt;
#&gt; &lt;expr line1=&quot;1&quot; col1=&quot;11&quot; line2=&quot;1&quot; col2=&quot;11&quot; start=&quot;38&quot; end=&quot;38&quot;&gt;
#&gt; &lt;SYMBOL line1=&quot;1&quot; col1=&quot;11&quot; line2=&quot;1&quot; col2=&quot;11&quot; start=&quot;38&quot; end=&quot;38&quot;&gt;x&lt;/SYMBOL&gt;
#&gt; &lt;/expr&gt;
#&gt; &lt;OP-COMMA line1=&quot;1&quot; col1=&quot;12&quot; line2=&quot;1&quot; col2=&quot;12&quot; start=&quot;39&quot; end=&quot;39&quot;&gt;,&lt;/OP-COMMA&gt;
#&gt; &lt;SYMBOL_SUB line1=&quot;1&quot; col1=&quot;14&quot; line2=&quot;1&quot; col2=&quot;18&quot; start=&quot;41&quot; end=&quot;45&quot;&gt;na.rm&lt;/SYMBOL_SUB&gt;
#&gt; &lt;EQ_SUB line1=&quot;1&quot; col1=&quot;20&quot; line2=&quot;1&quot; col2=&quot;20&quot; start=&quot;47&quot; end=&quot;47&quot;&gt;=&lt;/EQ_SUB&gt;
#&gt; &lt;expr line1=&quot;1&quot; col1=&quot;22&quot; line2=&quot;1&quot; col2=&quot;25&quot; start=&quot;49&quot; end=&quot;52&quot;&gt;
#&gt; &lt;NUM_CONST line1=&quot;1&quot; col1=&quot;22&quot; line2=&quot;1&quot; col2=&quot;25&quot; start=&quot;49&quot; end=&quot;52&quot;&gt;TRUE&lt;/NUM_CONST&gt;
#&gt; &lt;/expr&gt;
#&gt; &lt;OP-RIGHT-PAREN line1=&quot;1&quot; col1=&quot;26&quot; line2=&quot;1&quot; col2=&quot;26&quot; start=&quot;53&quot; end=&quot;53&quot;&gt;)&lt;/OP-RIGHT-PAREN&gt;
#&gt; &lt;/expr&gt;
#&gt; &lt;/expr&gt;
#&gt; &lt;/exprlist&gt;
</pre>
</div>
<p>In both cases, you recognize words such as <code>LEFT_ASSIGN</code> or <code>SYMBOL_FUNCTION_CALL</code>. Parsing is an essential step before the code is actually executed, but parsed code can also be used for other purposes, such as analyzing code without brittle regular expressions (does it call a particular <a href="https://nrennie.rbind.io/blog/how-to-make-your-own-rstats-wrapped/" rel="nofollow" target="_blank">function</a>?), navigating code (going from a function call to the definition of that function), or modifying code (replacing all occurrences of a function with another one).</p>
<p>Now, Tree-sitter performs this same code parsing but <strong>faster</strong> especially thanks to its support of incremental parsing – which is key to updating the syntax tree as you are typing in your editor for instance! Tree-sitter is agnostic in that it can parse any code as long as there is a grammar for it (think, Rosetta Stone plugins). It’s been used for many languages which means many tools have been built around it.</p>
<p>To have Tree-sitter “learn” a new language you need to give it a file containing the definition of the syntax of that language, what’s called a <em>grammar</em>. This is where the aforementioned JavaScript file by Davis Vaughan and collaborators comes into play! The <a href="https://github.com/r-lib/tree-sitter-r" rel="nofollow" target="_blank">treesitter-r repo</a>, which provides a translation of the R grammar in the format expected by Tree-sitter, is the base of all tools presented in this post which use R code as their input.</p>
<p>Here’s how to use the {treesitter} R package for the same code as earlier. The {treesitter} R package allows us to use Tree-sitter from R. To parse R code with it, we need the <code>language()</code> function from {treesitter.r}<sup id="fnref:1"><a href="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/#fn:1" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">1</a></sup>.</p>
<div class="highlight">
<pre>library(treesitter)
#&gt; 
#&gt; Attaching package: 'treesitter'
#&gt; The following object is masked from 'package:base':
#&gt; 
#&gt; range
language &lt;- treesitter.r::language()
parser &lt;- parser(language)
text &lt;- &quot;a &lt;- mean(x, na.rm = TRUE)&quot;
parser_parse(parser, text)
#&gt; &lt;tree_sitter_tree&gt;
#&gt; 
#&gt; ── Text ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#&gt; a &lt;- mean(x, na.rm = TRUE)
#&gt; 
#&gt; ── S-Expression ────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#&gt; (program [(0, 0), (0, 26)]
#&gt; (binary_operator [(0, 0), (0, 26)]
#&gt; lhs: (identifier [(0, 0), (0, 1)])
#&gt; operator: &quot;&lt;-&quot; [(0, 2), (0, 4)]
#&gt; rhs: (call [(0, 5), (0, 26)]
#&gt; function: (identifier [(0, 5), (0, 9)])
#&gt; arguments: (arguments [(0, 9), (0, 26)]
#&gt; open: &quot;(&quot; [(0, 9), (0, 10)]
#&gt; argument: (argument [(0, 10), (0, 11)]
#&gt; value: (identifier [(0, 10), (0, 11)])
#&gt; )
#&gt; (comma [(0, 11), (0, 12)])
#&gt; argument: (argument [(0, 13), (0, 25)]
#&gt; name: (identifier [(0, 13), (0, 18)])
#&gt; &quot;=&quot; [(0, 19), (0, 20)]
#&gt; value: (true [(0, 21), (0, 25)])
#&gt; )
#&gt; close: &quot;)&quot; [(0, 25), (0, 26)]
#&gt; )
#&gt; )
#&gt; )
#&gt; )
</pre>
</div>
<p>Tree-sitter is the workhorse of many tools, that are mentioned in the diagram below. All of them are dependent on tree-sitter and the R grammar provided to it. Some of them are command-line interfaces (CLIs), while others are R packages.</p>
<figure><img src="https://i0.wp.com/ropensci.org/blog/2026/04/02/tree-sitter-overview/tree-sitter.png?w=578&#038;ssl=1"
alt="Diagram of Tree-sitter tooling for R. At the center is Tree-sitter especially its Rust bindings and the R grammar for treesitter. At the top is the input, R scripts. At the bottom from treesitter is the treesitter R package,; ast-grep that is used by astgrepr which is used by flir and that is used by the CLAUDE.md instructions for parsing code; Air that is used by Jarl; Ark that is used by the Positron IDE; R code browsing on GitHub." data-recalc-dims="1">
</figure>
<h2>
Browsing code interactively: Positron IDE, GitHub
</h2><p>The real reason why the audience applauded Davis Vaughan is that he explained how the R grammar for Tree-sitter had been <a href="https://github.com/orgs/community/discussions/120397" rel="nofollow" target="_blank">deployed to GitHub</a> so that we get almost as good experience browsing R code on GitHub as browsing, say, JS code. If we search for a function name in a repository for instance, its definition will be indicated in the search results. See <a href="https://www.youtube.com/watch?v=Gm0ikRBAfwc" rel="nofollow" target="_blank">Davis’ slides</a> (also available in <a href="https://github.com/DavisVaughan/2024-07-09_useR-2024" rel="nofollow" target="_blank">PDF</a>), or refer to the video below showing how typing <code>vetiver_model</code> in the search bar from the R vetiver repo makes the function definition the first result, on which one can click to land into the definition.</p>
<video controls preload="auto" width="450" playsinline class="html-video">
<source src="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/searching-for-vetiver-model-r-new.mp4" type="video/mp4">
<span></span>
</video>
<p>Also very useful is the use of Tree-sitter by <a href="https://github.com/posit-dev/ark" rel="nofollow" target="_blank">Ark</a>, the R kernel <a href="https://lionel-.github.io/slidedecks/2024-07-11-ark/#/language-server-protocol-1" rel="nofollow" target="_blank">used in the Positron IDE</a>. Ark is how you get autocompletion and help on hover in Positron. The video below shows how you can extend the selection to further steps of a pipeline in Positron.</p>
<video controls preload="auto" width="450" playsinline class="html-video">
<source src="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/expand-selection-2.mp4" type="video/mp4">
<span></span>
</video>
<p>This use case of Tree-sitter is also featured in <a href="https://www.youtube.com/watch?v=Gm0ikRBAfwc" rel="nofollow" target="_blank">Davis’ slides</a>. See also Lionel Henry’s and Davis Vaughan’s talk about Ark at <a href="https://youtu.be/8uRcB34Hhsw?si=UeWqIi9PtEOWqRsp&#038;t=2109" rel="nofollow" target="_blank">posit conf 2024</a>, especially the part about <a href="https://youtu.be/8uRcB34Hhsw?si=GBqntC6tW7D2WhBN&#038;t=2455" rel="nofollow" target="_blank">code assistance</a>.</p>
<p>Other development environments such as <a href="https://lists.gnu.org/archive/html/emacs-devel/2022-11/msg01443.html" rel="nofollow" target="_blank">Emacs</a> also have support for Tree-sitter.</p>
<h2>
Searching/browsing code
</h2><p>You can parse and search R code using the {treesitter} R package and <a href="https://tree-sitter.github.io/tree-sitter/4-code-navigation.html" rel="nofollow" target="_blank">treesitter query syntax</a>. The {treesitter} R package is a dependency of the <a href="https://simonpcouch.github.io/gander/" rel="nofollow" target="_blank">{gander} package</a> by Simon Couch, that is meant to be used for a better experience with LLMs when writing R code. Another use case of the {treesitter} R package is the {igraph.r2cdocs} <a href="https://roxygen2.r-lib.org/dev/articles/extending.html" rel="nofollow" target="_blank">extension</a> to {roxygen2} for the {igraph} package, that <a href="https://github.com/igraph/igraph.r2cdocs/blob/6be2a327a18deb823302caeab8b60a916f6fac62/R/roxygen.R#L119" rel="nofollow" target="_blank">parses all of igraph R code</a> to then be able to identify, for each exported function, whether it (in)directly calls a function whose name ends with <code>_impl</code>, indicating a wrapper to a C igraph function whose docs can be then be linked from the manual of the R function.</p>
<p>The {pkgdepends} package calls Tree-sitter (<a href="https://github.com/r-lib/pkgdepends/blob/main/src/tree-sitter.c" rel="nofollow" target="_blank">C</a>) to detect <a href="https://github.com/r-lib/pkgdepends/blob/634661a7d91b41476fd1ab653fe3087a6e40b8a9/R/scan-deps.R#L340" rel="nofollow" target="_blank">dependencies in files</a>. Below we run it on the source of the <a href="https://docs.ropensci.org/saperlipopette/" rel="nofollow" target="_blank">saperlipopette R package</a>.</p>
<div class="highlight">
<pre>pkgdepends::scan_deps(
 &quot;../../../../../CHAMPIONS/saperlipopette&quot;,
 &quot;../../../../../CHAMPIONS&quot;
)
#&gt; 
#&gt; Dependencies:
#&gt; + brio  @ R/blame.R, R/check-editor.R, R/clean-dir.R, R/committed-to-main.R, R/committed-to-wrong-branch.R, R/conflict…
#&gt; + cli  @ inst/exo_bisect-Rprofile.en.R, inst/exo_bisect-Rprofile.es.R, inst/exo_bisect-Rprofile.fr.R, inst/exo_blame-…
#&gt; + devtools  @ saperlipopette.Rproj
#&gt; + fs  @ R/blame.R, R/check-editor.R, R/clean-dir.R, R/committed-to-main.R, R/committed-to-wrong-branch.R, R/conflict…
#&gt; + gert  @ inst/exo_check_editor-Rprofile.en.R, inst/exo_check_editor-Rprofile.es.R, inst/exo_check_editor-Rprofile.fr.…
#&gt; + knitr  @ README.Rmd
#&gt; + parsedate  @ R/utils-git.R
#&gt; + purrr  @ R/create-all.R, R/debug.R, R/log-deleted-file.R, R/log-deleted-line.R, R/revparse.R, R/roxygen2.R, R/worktre…
#&gt; + rlang  @ R/create-all.R, R/roxygen2.R, R/utils-fs.R, R/utils-usethis.R, R/zzz.R
#&gt; + rmarkdown  @ README.Rmd, vignettes/saperlipopette.qmd
#&gt; + roxygen2  @ R/roxygen2.R, saperlipopette.Rproj
#&gt; + saperlipopette @ README.Rmd, vignettes/saperlipopette.qmd
#&gt; + tibble  @ R/roxygen2.R
#&gt; + usethis  @ R/blame.R, R/check-editor.R, R/clean-dir.R, R/committed-to-main.R, R/committed-to-wrong-branch.R, R/conflict…
#&gt; + vctrs  @ R/roxygen2.R
#&gt; + withr  @ R/blame.R, R/check-editor.R, R/clean-dir.R, R/committed-to-main.R, R/committed-to-wrong-branch.R, R/conflict…
#&gt; 
#&gt; Test dependencies:
#&gt; + fs  @ tests/testthat/test-blame.R, tests/testthat/test-check-editor.R, tests/testthat/test-clean-dir.R, tests/test…
#&gt; + gert  @ tests/testthat/test-blame.R, tests/testthat/test-clean-dir.R, tests/testthat/test-committed-to-main.R, tests…
#&gt; + rlang  @ tests/testthat/test-blame.R, tests/testthat/test-check-editor.R, tests/testthat/test-clean-dir.R, tests/test…
#&gt; + saperlipopette @ tests/testthat.R
#&gt; + testthat  @ tests/testthat.R
#&gt; + withr  @ tests/testthat/test-blame.R, tests/testthat/test-check-editor.R, tests/testthat/test-clean-dir.R, tests/test…
</pre>
</div>
<p><a href="https://ast-grep.github.io/" rel="nofollow" target="_blank">ast-grep</a> is a useful tool built on Tree-sitter for searching and re-writing code, with a clearer query syntax than Tree-sitter’s. Its name is reminiscent of grep, but with ast-grep we do not need to write brittle regular expressions <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f638.png" alt="😸" class="wp-smiley" style="height: 1em; max-height: 1em;" />. <a href="https://astgrepr.etiennebacher.com/" rel="nofollow" target="_blank">{astgrepr}</a> by Etienne Bacher is an R wrapper to the Rust bindings of ast-grep, and is used in Etienne’s <a href="https://flir.etiennebacher.com/" rel="nofollow" target="_blank">{flir} package</a> for <a href="https://flir.etiennebacher.com/articles/adding_rules" rel="nofollow" target="_blank">refactoring</a> code.</p>
<p>The ast-grep command-line interface (CLI) itself is featured in a useful <a href="https://emilhvitfeldt.com/post/ast-grep-r-claude-code/" rel="nofollow" target="_blank">blog post by Emil Hvitfeldt</a> where he explains how to document the usage of ast-grep for Claude.</p>
<h2>
Formatting and linting: Air, Jarl
</h2><p>Speaking of CLIs…</p>
<figure><img src="https://i2.wp.com/ropensci.org/blog/2026/04/02/tree-sitter-overview/meme.png?w=578&#038;ssl=1"
alt="Cute kitten attacked by robots. The text says &#39;Everytime you use Claude for something a CLI can do, a kitten dies&#39;." data-recalc-dims="1">
</figure>
<p><a href="https://posit-dev.github.io/air/cli.html" rel="nofollow" target="_blank">Air</a>, by Davis Vaughan and Lionel Henry, is a CLI built on Tree-sitter, in Rust. It <em>reformats</em> code blazingly fast.</p>
<p><a href="https://jarl.etiennebacher.com/" rel="nofollow" target="_blank">Jarl</a>, by Etienne Bacher, is a CLI built on Air, therefore also on Tree-sitter, in Rust. It <em>lints</em> and <em>fixes</em> code, also blazingly fast. It can even detect <a href="https://jarl.etiennebacher.com/rules/unreachable_code" rel="nofollow" target="_blank">unreachable code</a>, <a href="https://jarl.etiennebacher.com/rules/unused_function" rel="nofollow" target="_blank">unused functions</a> and <a href="https://jarl.etiennebacher.com/rules/duplicated_function_definition" rel="nofollow" target="_blank">duplicated function definitions</a>.</p>
<p>In both of these examples, the creation of <em>CLIs</em> wrapping Rust bindings was more efficient than the creation of R packages wrapping the {treesitter} R package, for several reasons:</p>
<ul>
<li>Rust CLIs can edit code very fast<sup id="fnref:2"><a href="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/#fn:2" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">2</a></sup>;</li>
<li>CLIs are integrated in extensions for popular IDEs (for instance Positron);</li>
<li>a CLI is easier to install on CI than an R package that needs, well, an R installation.</li>
</ul>
<h2>
More tools
</h2><p>A brief mention of some other interesting tools we’ve explored a bit less.</p>
<h3>
Configuring: {ts} for parsing JSON and TOML (not R!)
</h3><p>The <a href="https://github.com/r-lib/ts" rel="nofollow" target="_blank">{ts}</a> package by Gábor Csárdi is the backbone of two R packages used for editing and manipulating:</p>
<ul>
<li>TOML <a href="https://gaborcsardi.github.io/tstoml/" rel="nofollow" target="_blank">{tstoml}</a>;</li>
<li>JSON <a href="https://gaborcsardi.github.io/tsjsonc/" rel="nofollow" target="_blank">{tsjson}</a>.</li>
</ul>
<p>Compared to existing parsers in R for those formats, these two packages preserve comments.</p>
<h3>
Testing code: {muttest}
</h3><p><a href="https://en.wikipedia.org/wiki/Mutation_testing" rel="nofollow" target="_blank">Mutation testing</a> is a kind of testing where you, say, randomly swap <code>+</code> with <code>-</code> in your code (you <em>mutate</em> it) and you run your tests to see whether they catch the mutant. The <a href="https://github.com/jakubsob/muttest" rel="nofollow" target="_blank">{muttest} package</a> by Jakub Sobolewski is an R package for mutation testing, that depends on the {treesitter} R package.</p>
<h3>
Diffing code: difftastic
</h3><p>The difftastic CLI by Wilfred Hughes is “a structural diff tool that understands syntax”. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2728.png" alt="✨" class="wp-smiley" style="height: 1em; max-height: 1em;" /> This means that difftastic doesn’t only compare line or “words” but actual syntax by looking at lines around the lines that changed (by default, 3). Even better, it understands R out of the box. See this <a href="https://masalmon.eu/2026/03/30/difftastic/" rel="nofollow" target="_blank">blog post with examples of R code diffing</a>.</p>
<h2>
Conclusion: more to come?
</h2><p>In this post, we’ve presented an overview of Tree-sitter based tooling for R or in R.</p>
<p>Note that this ecosystem of tools is very actively developed, so some tools might come and go. However, the idea that plugging the R grammar into a general parsing generator brings cool features to us R developers, will remain true. Maybe <em>you</em> will contribute to this ecosystem, either through an existing tool or by creating a new one?</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>We could also parse C code with it using <a href="https://sounkou-bioinfo.github.io/treesitter.c/" rel="nofollow" target="_blank">{treesitter.c}</a>. <a href="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/#fnref:1" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
<li id="fn:2">
<p>Rust is a lower level language than R so has less overhead; furthermore this kind of Rust code can be easily parallelized. <a href="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/#fnref:2" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
</ol>
</div>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/04/02/tree-sitter-overview/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/04/a-better-r-programming-experience-thanks-to-tree-sitter/">A Better R Programming Experience Thanks to Tree-sitter</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400270</post-id>	</item>
		<item>
		<title>Techtonique dot net is down until further notice</title>
		<link>https://www.r-bloggers.com/2026/03/techtonique-dot-net-is-down-until-further-notice/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/04/01/r/python/techtonique/techtonique-dot-net-down</guid>

					<description><![CDATA[<p>Techtonique dot net is down until further notice</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/techtonique-dot-net-is-down-until-further-notice/">Techtonique dot net is down until further notice</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/04/01/r/python/techtonique/techtonique-dot-net-down"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><strong>IMPORTANT: The website <a href="https://www.techtonique.net/" rel="nofollow" target="_blank">https://www.techtonique.net</a> is down until further notice.</strong></p>

<p><a href="https://www.techtonique.net/" rel="nofollow" target="_blank">https://www.techtonique.net</a> contained an language-agnostic API for machine learning tasks (classification, regression, survival analysis, forecasting etc.).</p>

<p>As a result, do not buy the Gumroad tutorial then.</p>

<p>You can still use the packages <a href="https://github.com/Techtonique" rel="nofollow" target="_blank">https://github.com/Techtonique</a> locally.</p>

<p>PS: It’s not an April’s fool joke.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/04/01/r/python/techtonique/techtonique-dot-net-down"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/techtonique-dot-net-is-down-until-further-notice/">Techtonique dot net is down until further notice</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400260</post-id>	</item>
		<item>
		<title>Transgender Day of Visibility</title>
		<link>https://www.r-bloggers.com/2026/03/transgender-day-of-visibility/</link>
		
		<dc:creator><![CDATA[Jerry Tuttle]]></dc:creator>
		<pubDate>Tue, 31 Mar 2026 04:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://www.r-bloggers.com/?guid=5ea12746cec1b25e0746ca21879d6565</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>     <br />
March 31 is Transgender Day of Visibility. I’m not transgender myself, but I have friends, acquaintances, and family members who are. Chances are you do too, whether you realize it or not.   </p>
<p>  &#038;nbs...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/transgender-day-of-visibility/">Transgender Day of Visibility</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://onlinecollegemathteacher.blogspot.com/2026/03/transgender-day-of-visibility.html"> Online College Math Teacher</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<font size = 3>

  
 <div class="separator" style="clear: both;"><a href="https://i1.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZfk2-sxWBT_tk6b5Aq5W_ZFTvUKWWBlbiDmUau8As6CakkBCcLnsYGrDEQpTekShVIbQFBUBWux9Kf7YcQ1LW6In9s8uh-vjliUKtDdpH5OzVs50JAeA9HTGG5gFrWy1eJqFub5puN3zIv5Pj0jji6KLwkmLFq1rKUx3KwIWOpmXhVcFRZCW-EY7iEqE/s1152/Transgender_Pride_Flag.jpeg?ssl=1" style="display: block; padding: 1em 0; text-align: center; " rel="nofollow" target="_blank"><img alt="" border="0" width="400" data-original-height="487" data-original-width="450" src="https://i0.wp.com/blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZfk2-sxWBT_tk6b5Aq5W_ZFTvUKWWBlbiDmUau8As6CakkBCcLnsYGrDEQpTekShVIbQFBUBWux9Kf7YcQ1LW6In9s8uh-vjliUKtDdpH5OzVs50JAeA9HTGG5gFrWy1eJqFub5puN3zIv5Pj0jji6KLwkmLFq1rKUx3KwIWOpmXhVcFRZCW-EY7iEqE/s400/Transgender_Pride_Flag.jpeg?resize=400%2C487&#038;ssl=1" data-recalc-dims="1"/></a></div>
  
     
March 31 is Transgender Day of Visibility. I’m not transgender myself, but I have friends, acquaintances, and family members who are. Chances are you do too, whether you realize it or not.   <p>
  
     
My understanding is that transgender is an umbrella term for people whose gender identity or expression differs from the sex they were assigned at birth. It reflects a deeply held internal sense of self—something not defined by appearance, clothing, or medical procedures. Being transgender is about identity, not sexual orientation.      <p>
  
     
What I’ve learned over time is that many transgender people face challenges most of us never have to think about. These include discrimination, gaps in legal protection, denial of medical care, and even physical violence. There are also everyday barriers that rarely make headlines, like difficulty obtaining accurate driver’s licenses or passports—documents most of us take for granted.      <p>
  
     
I found this resource helpful:
<a href = "https://www.hrc.org/resources/understanding-the-transgender-community">
  Understanding the Transgender Community</a> .  <p>
  
       
I wish every transgender person could live their life openly, safely, and without being hassled for who they are.      <p>
  
     
The five color Transgender Pride Flag was designed by Monica Helms in 1999. I made this flag in computer language R.  Here is the R code.<p>

########################

<pre>

library(ggplot2)

# Define the colors in order: Blue, Pink, White, Pink, Blue
trans_colors &lt;- c(&quot;#5BCEFA&quot;, &quot;#F5A9B8&quot;, &quot;#FFFFFF&quot;, &quot;#F5A9B8&quot;, &quot;#5BCEFA&quot;)

# Create a data frame for the 5 stripes
flag_data &lt;- data.frame(
  stripe = factor(1:5),
  height = rep(1, 5)
)

# Plot the flag
ggplot(flag_data, aes(x = 1, y = height, fill = stripe)) +
  geom_bar(stat = &quot;identity&quot;, width = 1, color = NA) +
  scale_fill_manual(values = rev(trans_colors)) + # Reverse to stack correctly
  theme_void() + # Remove axes and labels
  theme(legend.position = &quot;none&quot;) +
  coord_cartesian(expand = FALSE)


</pre>
End
</font>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://onlinecollegemathteacher.blogspot.com/2026/03/transgender-day-of-visibility.html"> Online College Math Teacher</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/transgender-day-of-visibility/">Transgender Day of Visibility</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400212</post-id>	</item>
		<item>
		<title>Meet dataviewR: The View() You Always Wanted</title>
		<link>https://www.r-bloggers.com/2026/03/meet-dataviewr-the-view-you-always-wanted/</link>
		
		<dc:creator><![CDATA[Siddhesh Pujari]]></dc:creator>
		<pubDate>Tue, 31 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.html</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Disclaimer: This blog contains opinions that are of the authors alone and do not necessarily reflect the strategy of their respective organizations.</p>
<p>The humble View() and its limits<br />
View() has served R programmers well for a long time — pass ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/meet-dataviewr-the-view-you-always-wanted/">Meet dataviewR: The View() You Always Wanted</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.html"> pharmaverse blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<!--------------- typical setup ----------------->
<!--------------- post begins here ----------------->
<p>Disclaimer: This blog contains opinions that are of the authors alone and do not necessarily reflect the strategy of their respective organizations.</p>
<section id="the-humble-view-and-its-limits" class="level2">
<h2 class="anchored" data-anchor-id="the-humble-view-and-its-limits">The humble <code>View()</code> and its limits</h2>
<p><code>View()</code> has served R programmers well for a long time — pass it a data frame, get a spreadsheet-style window. It even has a basic search bar for checking whether a value exists in your data. But the moment you need something more precise — filter by a specific column or combine conditions — you are back in your script.</p>
<p>Add to that: no side-by-side dataset comparison, no metadata inspection, and no way to carry your exploration into reproducible code. For day-to-day clinical data work — reviewing your clinical datasets, let’s say <code>ADSL</code>, cross-checking subject demographics against lab data in <code>ADLB</code>, doing a QC pass before analysis — these gaps add up.</p>
</section>
<section id="what-is-dataviewr" class="level2">
<h2 class="anchored" data-anchor-id="what-is-dataviewr">What is dataviewR?</h2>
<p><a href="https://madhankumarnagaraji.github.io/dataviewR/" rel="nofollow" target="_blank"><code>dataviewR</code></a> is a Shiny-based interactive data viewer that works alongside <code>View()</code> as a companion — not a replacement. It launches directly in your RStudio Viewer pane, requires no Shiny code, and never modifies the datasets passed to it.</p>
<pre>install.packages(&quot;dataviewR&quot;)</pre>
</section>
<section id="features" class="level2">
<h2 class="anchored" data-anchor-id="features">Features</h2>
<p><code>dataviewR</code> offers the following capabilities:</p>
<ul>
<li><strong>Interactive Filtering</strong> — apply <code>dplyr</code>-style expressions directly in the app, no script changes needed. Supports <code>%in%</code>, <code>is.na()</code>, <code>grepl()</code>, and compound conditions — the same syntax you already write every day.</li>
</ul>
<pre>SEX == &quot;F&quot; & AGE &gt; 65 & TRT01P == &quot;Xanomeline High Dose&quot;</pre>
<ul>
<li><strong>Reproducible Code Generation</strong> — hit “Generate R Code” and walk away with ready-to-use <code>dplyr</code> code from your interactions. Your exploration session feeds directly into your scripted workflow.</li>
</ul>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://i1.wp.com/pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/dataview_codegen.png?w=578&#038;ssl=1" class="img-fluid figure-img" style="width:60.0%" data-recalc-dims="1"></p>
<figcaption>Auto-generated dplyr code from interactive filter and column selections.</figcaption>
</figure>
</div>
<ul>
<li><strong>Variable Metadata</strong> — inspect variable classes, labels and attributes without writing <code>str()</code> or <code>attr()</code> calls. Particularly useful for clinical datasets where CDISC-style labels are carried as R attributes.</li>
</ul>
<p>See all of this in action:</p>
<video width="450" controls="" playbackrate="2" onloadedmetadata="this.playbackRate = 2;">
<source src="dataview_explore.mp4" type="video/mp4">
</video>
</section>
<section id="cross-checking-multiple-datasets" class="level2">
<h2 class="anchored" data-anchor-id="cross-checking-multiple-datasets">Cross-Checking Multiple Datasets</h2>
<p>But what if you need to look at more than one dataset at the same time? <code>dataviewR</code> handles that too — pass multiple datasets in a single call and each opens in its own tab within the same session. Switch between them, filter independently, and track a specific subject across datasets — the kind of check that comes up in every safety review.</p>
<pre>library(dataviewR)
library(pharmaverseadam)

dataviewer(adsl, adlb)</pre>
<video width="450" controls="" onloadedmetadata="this.playbackRate = 1.5;">
<source src="dataview_multidata.mp4" type="video/mp4">
</video>
</section>
<section id="final-thoughts" class="level2">
<h2 class="anchored" data-anchor-id="final-thoughts">Final Thoughts</h2>
<p>That is dataviewR in a nutshell. Try it out and share your thoughts on <a href="https://github.com/madhankumarnagaraji/dataviewR" rel="nofollow" target="_blank">GitHub</a> or <a href="https://pharmaverse.slack.com/" rel="nofollow" target="_blank">pharmaverse Slack</a>!</p>
<p>Full documentation, vignettes, and clinical dataset examples are available at <a href="https://madhankumarnagaraji.github.io/dataviewR/" rel="nofollow" target="_blank">madhankumarnagaraji.github.io/dataviewR</a>.</p>
<!--------------- appendices go here ----------------->
</section>
<div class="cell">
<div class="cell-output-display">


</div>
</div>



<div id="quarto-appendix" class="default"><section id="last-updated" class="level2 appendix"><h2 class="anchored quarto-appendix-heading">Last updated</h2><div class="quarto-appendix-contents">

<p>2026-03-31 18:28:15.461796</p>
</div></section><section id="details" class="level2 appendix"><h2 class="anchored quarto-appendix-heading">Details</h2><div class="quarto-appendix-contents">

<p><a href="https://github.com/pharmaverse/blog/tree/main/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.qmd" rel="nofollow" target="_blank">Source</a>, <a href="https://pharmaverse.github.io/blog/session_info.html" rel="nofollow" target="_blank">Session info</a></p>
</div></section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="nofollow" href="https://creativecommons.org/licenses/by/4.0/" target="_blank">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre>@online{pujari2026,
  author = {Pujari, Siddhesh and Kumar N, Madhan and S, Gomathi and
    Haight, Mackenzie},
  title = {Meet {dataviewR:} {The} {View()} {You} {Always} {Wanted}},
  date = {2026-03-31},
  url = {https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.html},
  langid = {en}
}
</pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-pujari2026" class="csl-entry quarto-appendix-citeas">
Pujari, Siddhesh, Madhan Kumar N, Gomathi S, and Mackenzie Haight. 2026.
<span>“Meet dataviewR: The View() You Always Wanted.”</span> March 31,
2026. <a href="https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.html" rel="nofollow" target="_blank">https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.html</a>.
</div></div></section></div> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/meet-dataviewr-the-view-you-always-wanted.html"> pharmaverse blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/meet-dataviewr-the-view-you-always-wanted/">Meet dataviewR: The View() You Always Wanted</a>]]></content:encoded>
					
		
		<enclosure url="https://pharmaverse.github.io/blog/posts/2026-03-29-meet-dataviewr-the/dataviewR-logo.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">400225</post-id>	</item>
		<item>
		<title>AGENTS.md, {admiral}, and the AI-Assisted Programmer</title>
		<link>https://www.r-bloggers.com/2026/03/agents-md-admiral-and-the-ai-assisted-programmer/</link>
		
		<dc:creator><![CDATA[Jeff Dickinson]]></dc:creator>
		<pubDate>Tue, 31 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://pharmaverse.github.io/blog/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.html</guid>

					<description><![CDATA[<p>Introduction<br />
AI coding assistants are becoming a natural part of how clinical R programmers work — autocompleting functions, suggesting test cases, drafting derivations. But out of the box, these agents know nothing about ADaM conventions, CD...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/agents-md-admiral-and-the-ai-assisted-programmer/">AGENTS.md, {admiral}, and the AI-Assisted Programmer</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://pharmaverse.github.io/blog/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.html"> pharmaverse blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<!--------------- typical setup ----------------->
<!--------------- post begins here ----------------->
<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>AI coding assistants are becoming a natural part of how clinical R programmers work — autocompleting functions, suggesting test cases, drafting derivations. But out of the box, these agents know nothing about ADaM conventions, CDISC standards, or how the pharmaverse ecosystem fits together. They don’t know that analysis flag variables such as <code>ANL01FL</code> typically take <code>&quot;Y&quot;</code> or <code>NA</code> and not <code>&quot;N&quot;</code> They don’t know that <code>{pharmaversesdtm}</code> is the canonical source of test SDTM data, or that <code>{xportr}</code> is waiting downstream to turn your dataset into a submission-ready XPT file. <code>AGENTS.md</code> is a simple, open standard that changes that — and the <code>{admiral}</code> ecosystem now has infrastructure to generate and maintain these files automatically across every package in the family.</p>
</section>
<section id="what-is-agents.md" class="level2">
<h2 class="anchored" data-anchor-id="what-is-agents.md">What Is AGENTS.md?</h2>
<p><code>AGENTS.md</code> is a plain markdown file you commit to your repository that gives AI coding agents the context they need to work correctly in your project. Think of it as a README for agents — while <code>README.md</code> tells a new developer what the project is, <code>AGENTS.md</code> tells an AI assistant how to work in it correctly.</p>
<p>The format is supported across the growing ecosystem of AI coding tools: OpenAI Codex, GitHub Copilot, Google’s Jules, Cursor, Aider, Gemini CLI, and more. One file, version-controlled alongside your code, works everywhere.</p>
</section>
<section id="why-this-matters-for-the-pharmaverse" class="level2">
<h2 class="anchored" data-anchor-id="why-this-matters-for-the-pharmaverse">Why This Matters for the pharmaverse</h2>
<p>ADaM derivations encode decades of CDISC regulatory expectations that don’t appear anywhere in the R syntax. The fact that <code>ANL01FL</code> is an analysis flag with specific derivation logic, that <code>DTYPE = &quot;LLOQ&quot;</code> imputation records follow specific rules — none of this is inferable from the code alone.</p>
<p>The <code>{admiral}</code> package also doesn’t exist in isolation. It operates in a pipeline that flows from <code>{pharmaversesdtm}</code> test data through admiral derivations, often guided by <code>{metacore}</code> specifications, and ultimately out through <code>{xportr}</code> to submission-ready XPT files. An agent writing admiral code without that context is like a new programmer who only knows the function they’re editing — not the system it belongs to.</p>
<p>An <code>AGENTS.md</code> in an admiral-family repository can communicate all of this before the agent writes a single line of code.</p>
</section>
<section id="a-first-step-in-the-pharmaverse-ai-strategy" class="level2">
<h2 class="anchored" data-anchor-id="a-first-step-in-the-pharmaverse-ai-strategy">A First Step in the pharmaverse AI Strategy</h2>
<p>The <code>{admiral}</code> team is actively discussing how to formalize its approach to AI-assisted development — what tools to encourage, what guardrails to put in place, and how to document AI’s role in the programming strategy. That conversation is still early, and deliberately so: the consensus is to gain real experience before locking in formal guidance.</p>
<p>The good news is that experience is already arriving, fast.</p>
<p><strong>March 7, 2026 — the question arrives.</strong> <a href="https://github.com/pharmaverse/admiral/pull/2996" rel="nofollow" target="_blank">PR #2996</a> landed from a new contributor, <code>maxthecat2024</code>, fixing poorly formatted warning messages in <code>derive_param_computed()</code>. The PR was thorough and well-structured — detailed before/after examples, snapshot test conversions, a fully completed checklist. It was also the kind of contribution that made the team wonder: was this an AI bot? We don’t know for certain, and ultimately it didn’t matter — the code was good and it got merged. But the question itself was telling.</p>
<p><strong>March 17, 2026 — the reality arrives.</strong> <a href="https://github.com/pharmaverse/admiral/pull/3010" rel="nofollow" target="_blank">PR #3010</a> was opened not by a human contributor, but by GitHub Copilot itself — the branch named <code>copilot/enhance-examples-derive-vars-merged-summary</code>, the author listed as <code>Copilot</code>. The PR enhanced documentation examples for <code>derive_vars_merged_summary()</code>, correctly picking up the admiral-specific <code>@examplesx</code> structured example pattern from the existing codebase — context that came directly from <code>AGENTS.md</code>.</p>
<p>But the PR also revealed an important limitation. Rather than running <code>devtools::document()</code> to regenerate the <code>.Rd</code> file, Copilot manually edited <code>man/derive_vars_merged_summary.Rd</code> directly — and its own PR description acknowledged this: <em>“Manually updated to match <code>roxygen2::roxygenize()</code> output.”</em> When a reviewer pointed this out, Copilot responded candidly: <em>“R is not available in my sandbox environment, so I can’t execute <code>devtools::document()</code> directly. For future sessions, I understand the correct workflow.”</em> A human reviewer ran <code>devtools::document()</code> outside the sandbox and pushed the correctly generated <code>.Rd</code> file in commit <code>c855860</code>.</p>
<p>This is an important nuance: the issue wasn’t that <code>AGENTS.md</code> was unclear — it was that Copilot’s execution environment simply didn’t have R available. No instruction, however well-written, can make an agent run a command it physically cannot execute. <code>AGENTS.md</code> can teach an agent the correct workflow; ensuring the environment supports that workflow is a separate, human-owned responsibility. That distinction matters as the community develops its AI strategy.</p>
<p>This two-week window tells the whole story. AI-assisted contributions are already arriving in the <code>{admiral}</code> repository. <code>AGENTS.md</code> is already helping agents understand project-specific conventions. And the gaps it exposes are already informing improvements. This is the feedback loop the community needs to build a thoughtful AI strategy — not speculation, but evidence.</p>
<p><code>AGENTS.md</code> represents the first tangible infrastructure to come out of that thinking. Whether a contributor is a human using an AI assistant, an autonomous agent, or something in between — the code still needs to follow ADaM conventions and pharmaverse standards. <code>AGENTS.md</code> helps ensure it does.</p>
<p>If you have thoughts on what the broader strategy should look like, the discussion is open — join the conversation at <a href="https://github.com/pharmaverse/admiraldev/issues/547" rel="nofollow" target="_blank">admiraldev issue #547</a>. The broader scientific open-source community is working through similar questions: rOpenSci recently published an <a href="https://ropensci.org/blog/2026/02/26/ropensci-ai-policy/" rel="nofollow" target="_blank">AI policy</a> that’s worth reading as a reference point for how these norms are taking shape. Notably, rOpenSci’s policy calls out <code>agents.md</code> directly in its software review submission template:</p>
<blockquote class="blockquote">
<p>“If your repository includes an ‘agents.md’ file or equivalent, please provide a link, and describe how this has been used in the development of your package.”</p>
</blockquote>
<p>That the broader open-source scientific community is already asking for <code>AGENTS.md</code> as part of peer review is a signal that this norm is catching on fast.</p>
</section>
<section id="the-admiral-ecosystem-implementation" class="level2">
<h2 class="anchored" data-anchor-id="the-admiral-ecosystem-implementation">The Admiral Ecosystem Implementation</h2>
<p>Rather than each package maintaining its own file by hand, the generation logic is centralized in <code>{admiralci}</code> and delivers a consistent, up-to-date file to every package that opts in. The workflow pulls together content from several sources:</p>
<ul>
<li><strong>Programming strategy and unit testing guidelines</strong> from <code>{admiraldev}</code></li>
<li><strong>Package-specific context</strong> from a YAML file in each repository (therapeutic area, related packages, relevant CDISC IGs)</li>
<li><strong>Ecosystem context</strong> describing how admiral-family packages fit into the broader pharmaverse pipeline</li>
<li><strong>ADaM fundamentals</strong> covering key variable conventions and controlled terminology patterns</li>
</ul>
<p>The <code>{admiral}</code> <code>AGENTS.md</code> is substantial — over 1,300 lines of auto-generated context pulled directly from the <code>{admiraldev}</code> programming strategy, git usage, and R CMD check vignettes. It even includes a built-in verification mechanism: agents are instructed to add the comment <code># admiral guidelines loaded</code> to the first line of every new R file they create, confirming the guidelines were actually read. It’s a small but clever way to make agent compliance observable during code review.</p>
<p>Here’s a simplified illustration of the kind of content the file contains:</p>
<pre># AGENTS.md — admiral

## Package Overview
{admiral} provides a toolbox for ADaM dataset construction in R,
following CDISC ADaM standards and pharmaverse conventions.

## ADaM Conventions
- Flag variables (ANL01FL, SAFFL, etc.) take values &quot;Y&quot; or NA
- PARAM/PARAMCD pairs must align with CDISC controlled terminology
- ASEQ must be derived as the last step before dataset finalization

## Ecosystem Context
- Test SDTM data: use {pharmaversesdtm} (CDISC pilot data)
- Downstream: datasets consumed by {xportr} for XPT transport files
- Metadata: {metacore}/{metatools} provide spec-driven variable control

## Unit Testing
- Use {testthat} with expect_dfs_equal() for dataset comparisons
- Every new function requires tests for typical use, edge cases, and errors

## Documentation
- Run devtools::document() to regenerate .Rd files — never edit man/ directly
- Update NEWS.md for any user-facing function changes</pre>
<p>One practical note: <code>AGENTS.md</code> at the repository root triggers a NOTE in R CMD check, so the file is added to <code>.Rbuildignore</code>. It also lives in <code>tests/testthat/</code> where testing-specific guidance is scoped closest to where it’s needed. Extension packages like <code>{admiralonco}</code>, <code>{admiralvaccine}</code>, and <code>{admiralpeds}</code> can layer their own context on top, adding therapeutic area-specific conventions without duplicating shared infrastructure content.</p>
</section>
<section id="how-to-adopt-this-in-your-package" class="level2">
<h2 class="anchored" data-anchor-id="how-to-adopt-this-in-your-package">How to Adopt This in Your Package</h2>
<p>If you maintain an admiral-family package, adoption is three steps:</p>
<ol type="1">
<li>Add a YAML configuration file to your repository with package-specific context (therapeutic area, related packages, relevant CDISC IGs)</li>
<li>Reference the reusable workflow from <code>{admiralci}</code> in your <code>.github/workflows/</code> directory</li>
<li>Add <code>^AGENTS\.md$</code> to your <code>.Rbuildignore</code></li>
</ol>
<p>The workflow then runs on a schedule, pulling the latest content from <code>{admiraldev}</code> and your package YAML and committing an updated <code>AGENTS.md</code> automatically.</p>
</section>
<section id="resources" class="level2">
<h2 class="anchored" data-anchor-id="resources">Resources</h2>
<ul>
<li><code>AGENTS.md</code> standard: <a href="https://agents.md/" class="uri" rel="nofollow" target="_blank">https://agents.md</a></li>
<li><code>{admiral}</code> <code>AGENTS.md</code> (live): <a href="https://github.com/pharmaverse/admiral/blob/main/AGENTS.md" class="uri" rel="nofollow" target="_blank">https://github.com/pharmaverse/admiral/blob/main/AGENTS.md</a></li>
<li><code>{admiral}</code> <code>sync-admiralci-agents</code> workflow (live): <a href="https://github.com/pharmaverse/admiral/blob/main/.github/workflows/sync-admiralci-agents.yml" class="uri" rel="nofollow" target="_blank">https://github.com/pharmaverse/admiral/blob/main/.github/workflows/sync-admiralci-agents.yml</a></li>
<li>pharmaverse AI strategy discussion: <a href="https://github.com/pharmaverse/admiraldev/issues/547" class="uri" rel="nofollow" target="_blank">https://github.com/pharmaverse/admiraldev/issues/547</a></li>
<li>pharmaverse examples site: <a href="https://pharmaverse.github.io/examples/" class="uri" rel="nofollow" target="_blank">https://pharmaverse.github.io/examples/</a></li>
<li>rOpenSci AI policy: <a href="https://ropensci.org/blog/2026/02/26/ropensci-ai-policy/" class="uri" rel="nofollow" target="_blank">https://ropensci.org/blog/2026/02/26/ropensci-ai-policy/</a></li>
</ul>
<!--------------- appendices go here ----------------->
</section>
<div class="cell">
<div class="cell-output-display">


</div>
</div>



<div id="quarto-appendix" class="default"><section id="last-updated" class="level2 appendix"><h2 class="anchored quarto-appendix-heading">Last updated</h2><div class="quarto-appendix-contents">

<p>2026-03-30 18:40:51.936093</p>
</div></section><section id="details" class="level2 appendix"><h2 class="anchored quarto-appendix-heading">Details</h2><div class="quarto-appendix-contents">

<p><a href="https://github.com/pharmaverse/blog/tree/main/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.qmd" rel="nofollow" target="_blank">Source</a>, <a href="https://pharmaverse.github.io/blog/session_info.html" rel="nofollow" target="_blank">Session info</a></p>
</div></section><section class="quarto-appendix-contents" id="quarto-reuse"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div class="quarto-appendix-contents"><div><a rel="nofollow" href="https://creativecommons.org/licenses/by/4.0/" target="_blank">CC BY 4.0</a></div></div></section><section class="quarto-appendix-contents" id="quarto-citation"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre>@online{dickinson2026,
  author = {Dickinson, Jeff},
  title = {AGENTS.md, \{Admiral\}, and the {AI-Assisted} {Programmer}},
  date = {2026-03-31},
  url = {https://pharmaverse.github.io/blog/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.html},
  langid = {en}
}
</pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-dickinson2026" class="csl-entry quarto-appendix-citeas">
Dickinson, Jeff. 2026. <span>“AGENTS.md, {Admiral}, and the AI-Assisted
Programmer.”</span> March 31, 2026. <a href="https://pharmaverse.github.io/blog/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.html" rel="nofollow" target="_blank">https://pharmaverse.github.io/blog/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.html</a>.
</div></div></section></div> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://pharmaverse.github.io/blog/posts/2026-03-31-agents-md-admiral-a/agents-md-admiral-and-the-ai-assisted-programmer.html"> pharmaverse blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/agents-md-admiral-and-the-ai-assisted-programmer/">AGENTS.md, {admiral}, and the AI-Assisted Programmer</a>]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">400187</post-id>	</item>
		<item>
		<title>UK monarchs’ longevity against their people: a demographically correct reanalysis</title>
		<link>https://www.r-bloggers.com/2026/03/uk-monarchs-longevity-against-their-people-a-demographically-correct-reanalysis/</link>
		
		<dc:creator><![CDATA[Ilya Kashnitsky]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 22:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ikashnitsky.phd/2026/royal-longevity/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>NoteEnd of March is the time when I remember with warm nostalgia the vivid memories of working alongside and learning from Jim Vaupel, who died untimely on 27th March 2022. He was a brilliant demographer and a vital person who radiated lov...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/uk-monarchs-longevity-against-their-people-a-demographically-correct-reanalysis/">UK monarchs’ longevity against their people: a demographically correct reanalysis</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ikashnitsky.phd/2026/royal-longevity/"> Ilya Kashnitsky</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<script src="https://cdn.counter.dev/script.js" data-id="daa70324-89c3-4b1c-8ea0-3588f29f41f5" data-utcoffset="1"></script><div class="callout callout-style-default callout-note callout-empty-content callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>End of March is the time when I remember with warm nostalgia the vivid memories of working alongside and learning from Jim Vaupel, who died untimely on 27th March 2022. He was a brilliant demographer and a vital person who radiated love to demography and influenced generations of researchers in finding and shaping their academic paths. Please read more about Jim on our collective memorial webpage – <a href="https://remembering-james-vaupel.org/" class="uri" rel="nofollow" target="_blank">https://remembering-james-vaupel.org</a> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f49a.png" alt="💚" class="wp-smiley" style="height: 1em; max-height: 1em;" /> In this post, I’m revisiting one of the last projects that we worked on with Jim. Unlike all other posts in my blog, here I’m using plural voice since an earlier draft of the analysis was co-authored with Jim.
</div>
</div>
<div class="callout-body-container callout-body">

</div>
</div>
<section id="resonant-headlines-in-the-context-of-global-news" class="level1">
<h1>Resonant headlines in the context of global news</h1>
<p>Following the death of Prince Philip in April 2021, <em>The Conversation</em> published a <a href="https://theconversation.com/long-live-the-monarchy-british-royals-tend-to-survive-a-full-three-decades-longer-than-their-subjects-158766" rel="nofollow" target="_blank">piece by Jay Olshansky</a> titled <em>“Long live the monarchy! British royals tend to survive a full three decades longer than their subjects.”</em> In this, essentially, blog post routinely perceived by the media almost as a peer-reviewed article – the usual problem with The Conversation – the author compared the longevity of last six UK monarchs and their spouses with the longevity of their subjects. Employing a deeply flawed analysis, Olshansky arrived at sensational conclusions, which were, of course, elevated to the title of the piece and to the title of the only figure from the analysis, which <a href="https://www.altmetric.com/details/103817538" rel="nofollow" target="_blank">circulated widely in the media</a>.</p>
<p><img src="https://i1.wp.com/ikashnitsky.phd/2026/royal-longevity/olshansky-plot.jpg?w=578&#038;ssl=1" class="img-fluid" style="width:80.0%" data-recalc-dims="1"></p>
<p>Drawing far-reaching conclusions based on a handful of individuals’ lifespans is already very problematic, since longevity of humans fluctuates a lot by chance. Yet, apart from this obvious statistical limitation, there are at least two purely demographic methodological flaws in the analysis that make the conclusions completely wrong. Olshansky compared the lifespan of a UK monarch or spouse with the <strong>period life expectancy</strong> that prevailed in the <strong>year of their birth</strong>. This is demographically wrong, for at least two reasons.</p>
</section>
<section id="flawed-design-of-the-analysis" class="level1 page-columns page-full">
<h1>Flawed design of the analysis</h1>
<p><strong>Firstly</strong>, period life expectancy for a certain year is a poor predictor of the future lifespan of a child born in the year. <sup>1</sup> Despite the seemingly straightforward name, period life expectancy is <strong>not</strong> designed to forecast longevity, despite this being a way too often misinterpretation of the indicator. It is just a summary measure of current age-specific death rates in a population. In other words, life expectancy gives the average length of life for a cohort of newborns only in the unlikely (i.e. not registered in the observed human history) case when death rates remain unchanged throughout their lives. Mortality, however, has <a href="https://doi.org/10.1073/pnas.2019536118" rel="nofollow" target="_blank">decreased substantially</a> over the past two centuries in all countries, including the UK, and the actual longevity of people born in a specific year is usually <a href="https://doi.org/10.1080/00324720600895876" rel="nofollow" target="_blank">much higher</a> than the period life expectancy that was observed when they were born.</p>
<div class="no-row-height column-margin column-container"><div id="fn1"><p><sup>1</sup> Have a look at my <a href="https://ikashnitsky.github.io/2021/what-is-life-expectancy/" rel="nofollow" target="_blank">previous post</a> about common misinterpretations of life expectancy.</p></div></div><p>Why is this error so important? Because life expectancy at birth in historical populations was massively skewed by staggering infant and child mortality rates. When we hear that medieval peasants had a life expectancy of 35, it wasn’t because a hard life in the fields meant dropping dead at 36; it was because a huge fraction of the population died of disease during childhood. A peasant who reached adulthood actually had pretty good odds of reaching 60. And this leads us to the <strong>second</strong> massive flaw in the design of the initial analysis.</p>
<p>A monarch, by definition, has already survived childhood to reach the age of their coronation. It makes little sense to compare the actual fulfilled lifespans of royal individuals who succeeded to become monarchs with life expectancy <strong>at birth</strong> in the year of their birth. What about all their siblings who were less lucky? <sup>2</sup> One simply cannot evaluate historical longevity without properly accounting for survival bias. It’s all about selection and the luck of surviving through the hazardous early years of life. Back in the days, infant, child and early-adult mortality <a href="https://ourworldindata.org/child-mortality-in-the-past" rel="nofollow" target="_blank">used to be so high</a> that it’s hard for us to imagine how society functioned when half of the lifeborn kids don’t reach teenage. For the purpose of this reanalysis, we need to factor in the – too obvious when you spell it out – truth: only those royalty who survived to the date of their coronation became monarchs. <sup>3</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn2"><p><sup>2</sup> In data analysis this common fallacy is known as <a href="https://xkcd.com/1827" rel="nofollow" target="_blank"><em>survivorship bias</em></a>.</p></div><div id="fn3"><p><sup>3</sup> Interestingly, the age of UK monarchs’ coronation varied widely, from 9 years for Queen Victoria to 59 years for King Edward VII.</p></div></div></section>
<section id="a-demographically-correct-approach" class="level1 page-columns page-full">
<h1>A demographically correct approach</h1>
<p>So, what should a proper comparison look like if we still want to evaluate whether royals lived exceptionally long compared to their subjects? The methodological corrections are straightforward: 1) Instead of period life tables we should look at cohort life tables (also obtained from the <a href="https://www.mortality.org/" rel="nofollow" target="_blank">Human Mortality Database</a>); 2) As the comparison population, we need to look at the people who were born at the same year as the monarch in question and who survived at least until the age when this monarch was crowned. We compare monarch’s lifespan against the <strong>remaining cohort life expectancy</strong> of their birth cohort <em>at the exact age of the monarch’s coronation</em>. Correcting for these two errors, we obtained the demographically correct results below. <sup>4</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn4"><p><sup>4</sup> Let me re-iterate, just in case: we do not claim that this is a good way of researching the royal premium in survival. Our aim here is to correct the fundamental demographic flaws in the original widely circulated piece.</p></div></div><p><img src="https://i1.wp.com/ikashnitsky.phd/2026/royal-longevity/royal-survival.png?w=578&#038;ssl=1" class="img-fluid" data-recalc-dims="1"></p>
<p>The results? Yes, royals still enjoy a survival advantage over the general population. This is hardly a surprise – living in extreme privilege gives you access to the best diet, living environments, and medical care of your era. But the sensational headline from Olshansky’s piece no longer holds. Instead of the claimed 30 year advantage, we see a more modest 7.7 years of extra survival, on average across the 12 monarchs and their spouses.</p>
<p>And let’s highlight again that even this largely corrected figure does not convincingly claim that the royals live much longer than their subjects. We are still comparing a summary of 12 individual lifespans against population-level demographic averages. Human lifespans vary. <sup>5</sup></p>
<div class="no-row-height column-margin column-container"><div id="fn5"><p><sup>5</sup> In a <a href="http://doi.org/10.4054/DemRes.2021.44.35" rel="nofollow" target="_blank">recent article</a> we introduced a new outsurvival measure to study differences in longevity between populations, taking into account lifespan inequality.</p></div></div><p>The longevity premium for royalty depends on the age of coronation – the younger the monarch begins to reign, the fewer of those born in his or her year of birth are dead. Thus, the biggest differences in longevity are for those monarchs who stepped in very young, such as Queen Victoria or Queen Elizabeth II. When Queen Elizabeth II was crowned at age 25, 88% of her birth cohort was still alive. In comparison, when King Edward VII was crowned at age 59, only 37% of his birth cohort was alive.</p>
<p>Another way to frame the comparison is to calculate the percentage of monarch’s birth cohort who were alive at the monarch’s coronation and who were subsequently outlived by this monarch. This is a sort of p-score for the monarch’s longevity – how “well” did he or she “perform” compared correctly with their birth cohort.</p>
<p><img src="https://i0.wp.com/ikashnitsky.phd/2026/royal-longevity/royal-survival-relative.png?w=578&#038;ssl=1" class="img-fluid" data-recalc-dims="1"></p>
<p>King Edward VII outlived only 30% of males his age alive at his coronation: 70% of his peers alive at his crowning were alive at his funeral. Prince Philip outlived 99.5% of those UK males born in 1921 who lived at least until 1952.</p>
</section>
<section id="bonus" class="level1">
<h1>Bonus</h1>
<p>For the dedicated readers we offer a third plot in which we combined all the data discussed in the text in one figure. We realise it may be slightly challenging to process, but we also believe it provides a unique opportunity to see the whole data-story “at a glance”.</p>
<p><img src="https://i2.wp.com/ikashnitsky.phd/2026/royal-longevity/royal-survival-combined.png?w=578&#038;ssl=1" class="preview-image img-fluid" data-recalc-dims="1"></p>
<p><em>In the plot</em>: The colored stripes start at the age of the monarch’s coronation, they fade out as the remaining birth cohort dies out; the average survival of the reference cohorts is marked with white vertical ticks; survival to the coronation is annotated in red labels.</p>
<hr>
<div class="callout callout-style-default callout-tip callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Tip</span>Replication
</div>
</div>
<div class="callout-body-container callout-body">
<p>You can find the data and the <code>R</code> code to reproduce this re-analysis in <a href="https://gist.github.com/ikashnitsky/bec6af5ac0d57129a406ee5b5a522ce2" rel="nofollow" target="_blank">this GitHub gist</a>. The post is based on my earlier <a href="https://x.com/ikashnitsky/status/1382595760756244481" rel="nofollow" target="_blank">Twitter thread</a>.</p>
</div>
</div>


<!-- -->

</section>


 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ikashnitsky.phd/2026/royal-longevity/"> Ilya Kashnitsky</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/uk-monarchs-longevity-against-their-people-a-demographically-correct-reanalysis/">UK monarchs’ longevity against their people: a demographically correct reanalysis</a>]]></content:encoded>
					
		
		<enclosure url="https://ikashnitsky.phd/2026/royal-longevity/teaser.jpg" length="0" type="image/jpeg" />

		<post-id xmlns="com-wordpress:feed-additions:1">400222</post-id>	</item>
		<item>
		<title>Same model, better shape: why centering improves MCMC</title>
		<link>https://www.r-bloggers.com/2026/03/same-model-better-shape-why-centering-improves-mcmc/</link>
		
		<dc:creator><![CDATA[ouR data generation]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> The Emergency departments leading the transformation of Alzheimer’s and dementia care (ED-LEAD) study, which I have written about in the past, is approaching the end of its third year. This multifactorial design evaluates three independent, yet potenti...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/same-model-better-shape-why-centering-improves-mcmc/">Same model, better shape: why centering improves MCMC</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/"> ouR data generation</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>The <em>Emergency departments leading the transformation of Alzheimer’s and dementia care</em> (ED-LEAD) study, which I have written about in the <a href="https://www.rdatagen.net/post/2024-02-20-ensuring-balance-with-a-cluster-randomized-factorial-design/" rel="nofollow" target="_blank">past</a>, is approaching the end of its third year. This multifactorial design evaluates three independent, yet potentially synergistic, interventions aimed at improving care for persons living with dementia (PLWD) and their caregivers.</p>
<p>To estimate intervention effects, we are using what I’ve <a href="https://onlinelibrary.wiley.com/doi/full/10.1002/sim.70264" rel="nofollow" target="_blank">called</a> the <em>HEx-factor model</em>, a Bayesian hierarchical exchangeable factorial model. The original plan was to conduct all analyses using <a href="https://mc-stan.org/" rel="nofollow" target="_blank"><code>Stan</code></a>. However, we’ve run into a bit of a snafu. I’ve been working through the problem, and thought I’d share here.</p>




<p>The challenge turns out to be a computational one. Because the <em>ED-LEAD</em> analyses must be conducted on National Institute on Aging (NIA) Data LINKAGE servers, we are working in a somewhat restricted software environment, at least with respect to Bayesian data analysis. In particular, we have not been able to install or run <code>Stan</code>, which was our analytic engine of choice. This forced us to consider alternatives, and we turned to <code>JAGS</code>, which <em>is</em> available in the Linkage environment and certainly is well-suited for Bayesian hierarchical modeling.</p>
<p>At first glance, this might seem like a straightforward substitution. Both <code>Stan</code> and <code>JAGS</code> allow us to specify the same likelihood and priors. However, I quickly noticed that the models were not performing as well in <code>JAGS</code> as they had in <code>Stan</code>. It turns out that the samplers used in <code>JAGS</code> are more sensitive to posterior dependence than the Hamiltonian Monte Carlo (HMC) methods implemented in <code>Stan</code>.</p>
<p>I set out to understand and fix the problem, and found that a simple reparameterization—re-coding the binary treatment indicators—made a substantial difference. With this change, the <code>JAGS</code> sampler was able to explore the posterior distribution much more efficiently, yielding results comparable to those obtained with <code>Stan</code>.</p>
<p>To understand why this happens, I ran a series of simple simulations comparing the original and reparameterized versions of a basic two-way factorial model. That is what I present here.</p>
<div id="the-setup" class="section level3">
<h3>The setup</h3>
<p>In models with binary predictors and interactions, it turns out that <em>centering</em> can have a surprisingly large impact on computation, even though it does not change the underlying model. To see this clearly, I’ll start with a simple two-factor logistic model:
<span class="math display">\[
\text{logit}\big[P(Y=1)\big] = \alpha+ \beta_a A + \beta_b B + \beta_{ab}AB
\]</span>
where <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span> are binary treatment indicators. I’ll compare this to the algebraically equivalent centered version:
<span class="math display">\[
\text{logit}\big[P(Y=1)\big] = \alpha^*+ \gamma_a A^* + \gamma_b B^* + \gamma_{ab}A^*B^*
\]</span>
where</p>
<p><span class="math display">\[
A^* = A &#8211; 0.5, \ \ \ B^* = B &#8211; 0.5.
\]</span></p>
<p>The scientific model is unchanged. The question is whether the sampler behaves differently.</p>
<div id="log-odds-ratios-under-each-parameterization" class="section level4">
<h4>Log odds ratios under each parameterization</h4>
<p>With 0/1 coding, the log-odds ratio for <span class="math inline">\(A\)</span> alone (that is, when <span class="math inline">\(B=0\)</span>) is simply
<span class="math display">\[
\begin{align*}
\text{lOR}_a(B=0) &#038;= (\alpha + \beta_a \cdot 1 + \beta_b \cdot 0 + \beta_{ab} \cdot 0) -(\alpha + \beta_a \cdot 0 + \beta_b \cdot 0 + \beta_{ab} \cdot 0) \\
&#038; = \beta_a
\end{align*}
\]</span>
Analogously, the log-odds ratio for <span class="math inline">\(B\)</span> alone is <span class="math inline">\(\beta_b\)</span>. And if we want to compare the combination of both <span class="math inline">\(A=1\)</span> and <span class="math inline">\(B=1\)</span> to the case where neither is activated, then
<span class="math display">\[
\begin{align*}
\text{lOR}_{ab} &#038;= (\alpha + \beta_a \cdot 1 + \beta_b \cdot 1 + \beta_{ab} \cdot 1) \\
&#038;\quad &#8211;
(\alpha + \beta_a \cdot 0 + \beta_b \cdot 0 + \beta_{ab} \cdot 0) \\
&#038;= \beta_a + \beta_b + \beta_{ab}
\end{align*}
\]</span>
If instead we center the predictors, defining <span class="math inline">\(A^* = A &#8211; 0.5\)</span> and <span class="math inline">\(B^* = B &#8211; 0.5\)</span>, then the log-odds ratio of exposure to <span class="math inline">\(A\)</span> without exposure to <span class="math inline">\(B\)</span> relative to exposure to neither is
<span class="math display">\[
\begin{align*}
\text{lOR}_a(B=0)
&#038;=
(\alpha^* + \gamma_a(0.5) + \gamma_b(-0.5) + \gamma_{ab}(0.5)(-0.5)) \\
&#038;\quad &#8211;
(\alpha^* + \gamma_a(-0.5) + \gamma_b(-0.5) + \gamma_{ab}(-0.5)(-0.5)) \\
&#038;=
(\alpha^* + 0.5\gamma_a &#8211; 0.5\gamma_b &#8211; 0.25\gamma_{ab}) \\
&#038;\quad &#8211;
(\alpha^* &#8211; 0.5\gamma_a &#8211; 0.5\gamma_b + 0.25\gamma_{ab}) \\
&#038;=
\gamma_a &#8211; 0.5\gamma_{ab}.
\end{align*}
\]</span>
Using the same logic we can show that
<span class="math display">\[
\text{lOR}_{b} = \gamma_{b} &#8211; 0.5 \gamma_{ab}
\]</span>
and
<span class="math display">\[
\text{lOR}_{ab} = \gamma_a + \gamma_b.
\]</span>
<!-- $$ -->
<!-- \begin{align*} -->
<!-- \text{lOR}_{ab} &= (\alpha^* + \gamma_a(0.5) + \gamma_b(0.5) + \gamma_{ab}(0.5) (0.5)) -  (\alpha^* + \gamma_a(-0.5) + \gamma_b(-0.5) + \gamma_{ab}(-0.5)(-0.5))\\ -->
<!--  &= (\alpha^* + \gamma_a * 0.5 + \gamma_b * 0.5 + \gamma_{ab} * 0.25) -  (\alpha^* - \gamma_a * 0.5 - \gamma_b * 0.5 + \gamma_{ab} * 0.25)\\ -->
<!-- & = \gamma_a + \gamma_b -->
<!-- \end{align*} -->
<!-- $$ --></p>
</div>
</div>
<div id="bayesian-models-using-jags" class="section level3">
<h3>Bayesian models using JAGS</h3>
<p>The Bayesian model is a simple logistic regression with an interaction term:</p>
<p><span class="math display">\[
\begin{align*}
Y_i &#038;\sim \text{Bernoulli}(p_i), \\
\text{logit}(p_i)
&#038;= \alpha + \beta_a A_i + \beta_b B_i + \beta_{ab} A_i B_i,
\end{align*}
\]</span>
Here are the prior distribution assumptions, using variance-based notation to align with JAGS, which parameterizes normal distributions in terms of precision
<span class="math display">\[
\begin{align*}
\alpha &#038;\sim \mathcal{N}(0, 0.25^{-1}), \\
\beta_a &#038;\sim \mathcal{N}(0, 0.25^{-1}), \\
\beta_b &#038;\sim \mathcal{N}(0, 0.25^{-1}), \\
\beta_{ab} &#038;\sim \mathcal{N}(0, 25^{-1}).
\end{align*}
\]</span>
The centered model is similar, except that we replace the coefficients with <span class="math inline">\(\alpha^*\)</span> as well as <span class="math inline">\(\gamma_a\)</span>, <span class="math inline">\(\gamma_b\)</span>, <span class="math inline">\(\gamma_{ab}\)</span>, and define the predictors in terms of centered versions of <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>.</p>
</div>
<div id="simulations" class="section level3">
<h3>Simulations</h3>
<p>Before we get started on the simulations, we need to load the necessary libraries and set the seed in case you want to replicate these results:</p>
<pre>library(simstudy)
library(data.table)
library(ggplot2)
library(rjags)
library(coda)
library(posterior)
library(broom)
library(gt)

RNGkind(&quot;Mersenne-Twister&quot;, &quot;Inversion&quot;, &quot;Rejection&quot;)
set.seed(824)</pre>
<div id="creating-a-single-data-set" class="section level4">
<h4>Creating a single data set</h4>
<p>Here is the data generation process for a single data set. The outcome <span class="math inline">\(Y\)</span> is generated using the binary parameterization of <span class="math inline">\(A\)</span> and <span class="math inline">\(B\)</span>:</p>
<pre>s_gen &lt;- function(n = 2000,
                    alpha = -0.8,
                    beta_a = 0.5,
                    beta_b = 0.9,
                    beta_ab = -0.3) {
  
  def &lt;- 
    defData(varname = &quot;A&quot;, formula = 0.5, dist = &quot;binary&quot;) |&gt;
    defData(varname = &quot;B&quot;, formula = 0.5, dist = &quot;binary&quot;) |&gt;
    defData(varname = &quot;AB&quot;, formula = &quot;A*B&quot;, dist = &quot;nonrandom&quot;) |&gt;
    defData(varname = &quot;A_c&quot;, formula = &quot;A - 0.5&quot;, dist = &quot;nonrandom&quot;) |&gt;
    defData(varname = &quot;B_c&quot;, formula = &quot;B - 0.5&quot;, dist = &quot;nonrandom&quot;) |&gt;
    defData(varname = &quot;AB_c&quot;, formula = &quot;A_c * B_c&quot;, dist = &quot;nonrandom&quot;) |&gt;
    defData(
      varname = &quot;Y&quot;, 
      formula = &quot;..alpha + ..beta_a * A + ..beta_b * B + ..beta_ab * AB&quot;,
      dist = &quot;binary&quot;, link = &quot;logit&quot;
    )
    
  genData(n, def)
  
}

dd &lt;- s_gen()</pre>
</div>
<div id="the-two-parameterizations-fit-the-same-model" class="section level4">
<h4>The two parameterizations fit the same model</h4>
<p>First, here is the frequentist check of both models. The fitted probabilities are identical, even though the coefficients differ.</p>
<pre>fit_01 &lt;- glm(Y ~ A * B, data = dd, family = binomial)
fit_c  &lt;- glm(Y ~ A_c * B_c, data = dd, family = binomial)

tidy(fit_01)
## # A tibble: 4 × 5
##   term        estimate std.error statistic  p.value
##   &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
## 1 (Intercept)   -1.03     0.0995    -10.4  3.90e-25
## 2 A              0.835    0.135       6.18 6.34e-10
## 3 B              1.14     0.134       8.46 2.73e-17
## 4 A:B           -0.670    0.186      -3.61 3.11e- 4
tidy(fit_c)
## # A tibble: 4 × 5
##   term        estimate std.error statistic  p.value
##   &lt;chr&gt;          &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
## 1 (Intercept)   -0.212    0.0464     -4.57 4.91e- 6
## 2 A_c            0.501    0.0929      5.39 7.05e- 8
## 3 B_c            0.802    0.0929      8.63 6.07e-18
## 4 A_c:B_c       -0.670    0.186      -3.61 3.11e- 4</pre>
<p>From the 0/1-coded model, <span class="math inline">\(\text{lOR}_a = 0.835\)</span>, <span class="math inline">\(\text{lOR}_b = 1.14\)</span>, and <span class="math inline">\(\text{lOR}_{ab} = 0.835 + 1.14 - 0.67 = 1.305.\)</span></p>
<p>From the centered model,
<span class="math display">\[
\text{lOR}_a = 0.501 + 0.5*0.670 = 0.836
\]</span>
<span class="math display">\[
\text{lOR}_b = 0.802 + 0.5*0.670 = 1.137
\]</span>
<span class="math display">\[
\text{lOR}_{ab} = 0.501 + 0.802 = 1.303
\]</span>
So the coefficients themselves change under centering, but the underlying treatment contrasts do not.</p>
</div>
<div id="specifying-the-bayesian-models-in-jags" class="section level4">
<h4>Specifying the Bayesian models in JAGS</h4>
<p>Now we see that we can recover the same treatment contrasts using two different Bayesian models, though computational performance will be improved with centering.</p>
<p>Here is the JAGS code for each model:</p>
<pre>model_01 &lt;- &quot;
model {
  for (i in 1:N) {
    Y[i] ~ dbern(p[i])
    logit(p[i]) &lt;- alpha + beta_a * A[i] + beta_b * B[i] + beta_ab * AB[i]
  }
  
  alpha   ~ dnorm(0, 0.25)
  beta_a  ~ dnorm(0, 0.25)
  beta_b  ~ dnorm(0, 0.25)
  beta_ab ~ dnorm(0, 25)
}
&quot;

model_c &lt;- &quot;
model {
  for (i in 1:N) {
    Y[i] ~ dbern(p[i])
    logit(p[i]) &lt;- alpha + gamma_a * A_c[i] + gamma_b * B_c[i] + gamma_ab * AB_c[i]
  }
  
  alpha   ~ dnorm(0, 0.25)
  gamma_a  ~ dnorm(0, 0.25)
  gamma_b  ~ dnorm(0, 0.25)
  gamma_ab ~ dnorm(0, 25)
}
&quot;</pre>
</div>
<div id="fitting-the-models" class="section level4">
<h4>Fitting the models</h4>
<p>The function <code>fit_jags</code> fits one of the two models just described:</p>
<pre>fit_jags &lt;- function(dat, model_string, centered = FALSE,
                     n_chains = 3, burn = 2000, n_iter = 5000) {
  
  # jdat &lt;- as.list(dat[, .(Y, A, B, AB, A_c, B_c, AB_c)])
  if (centered) {
    jdat &lt;- as.list(dat[, .(Y, A_c, B_c, AB_c)])
    vars &lt;- c(&quot;alpha&quot;, &quot;gamma_a&quot;, &quot;gamma_b&quot;, &quot;gamma_ab&quot;)
  } else {
    jdat &lt;- as.list(dat[, .(Y, A, B, AB)])
    vars &lt;- c(&quot;alpha&quot;, &quot;beta_a&quot;, &quot;beta_b&quot;, &quot;beta_ab&quot;)
  }
  jdat$N &lt;- nrow(dat)
  
  mod &lt;- jags.model(
    textConnection(model_string),
    data = jdat,
    n.chains = n_chains,
    quiet = TRUE
  )
  
  update(mod, burn, progress.bar = &quot;none&quot;)
  
  samp &lt;- coda.samples(
    mod,
    variable.names = vars,
    n.iter = n_iter,
    progress.bar = &quot;none&quot;
  )
  
  samp
}</pre>
<p>Now, we can fit the models, collect the diagnostic data, and take a look at the results:</p>
<pre>samp_01 &lt;- fit_jags(dd, model_01, centered = FALSE)
samp_c  &lt;- fit_jags(dd, model_c, centered = TRUE)

diag_tbl &lt;- function(samp, model_name) {
  post &lt;- as_draws_df(samp)
  summ &lt;- summarise_draws(post)
  out &lt;- as.data.table(summ)
  out[, model := model_name]
  out[]
}

diag_01 &lt;- diag_tbl(samp_01, &quot;0/1-coded&quot;)
diag_c  &lt;- diag_tbl(samp_c, &quot;centered&quot;)</pre>
<p>Here are the summary statistics of the posterior distribution as well as the computational diagnostics:</p>
<div id="bxfxwqtyqe" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#bxfxwqtyqe table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#bxfxwqtyqe thead, #bxfxwqtyqe tbody, #bxfxwqtyqe tfoot, #bxfxwqtyqe tr, #bxfxwqtyqe td, #bxfxwqtyqe th {
  border-style: none;
}

#bxfxwqtyqe p {
  margin: 0;
  padding: 0;
}

#bxfxwqtyqe .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 15px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#bxfxwqtyqe .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#bxfxwqtyqe .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#bxfxwqtyqe .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#bxfxwqtyqe .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#bxfxwqtyqe .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#bxfxwqtyqe .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#bxfxwqtyqe .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#bxfxwqtyqe .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#bxfxwqtyqe .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#bxfxwqtyqe .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#bxfxwqtyqe .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#bxfxwqtyqe .gt_spanner_row {
  border-bottom-style: hidden;
}

#bxfxwqtyqe .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#bxfxwqtyqe .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#bxfxwqtyqe .gt_from_md > :first-child {
  margin-top: 0;
}

#bxfxwqtyqe .gt_from_md > :last-child {
  margin-bottom: 0;
}

#bxfxwqtyqe .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#bxfxwqtyqe .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#bxfxwqtyqe .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#bxfxwqtyqe .gt_row_group_first td {
  border-top-width: 2px;
}

#bxfxwqtyqe .gt_row_group_first th {
  border-top-width: 2px;
}

#bxfxwqtyqe .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#bxfxwqtyqe .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#bxfxwqtyqe .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#bxfxwqtyqe .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#bxfxwqtyqe .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#bxfxwqtyqe .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#bxfxwqtyqe .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#bxfxwqtyqe .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#bxfxwqtyqe .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#bxfxwqtyqe .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#bxfxwqtyqe .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#bxfxwqtyqe .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#bxfxwqtyqe .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#bxfxwqtyqe .gt_left {
  text-align: left;
}

#bxfxwqtyqe .gt_center {
  text-align: center;
}

#bxfxwqtyqe .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#bxfxwqtyqe .gt_font_normal {
  font-weight: normal;
}

#bxfxwqtyqe .gt_font_bold {
  font-weight: bold;
}

#bxfxwqtyqe .gt_font_italic {
  font-style: italic;
}

#bxfxwqtyqe .gt_super {
  font-size: 65%;
}

#bxfxwqtyqe .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#bxfxwqtyqe .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#bxfxwqtyqe .gt_indent_1 {
  text-indent: 5px;
}

#bxfxwqtyqe .gt_indent_2 {
  text-indent: 10px;
}

#bxfxwqtyqe .gt_indent_3 {
  text-indent: 15px;
}

#bxfxwqtyqe .gt_indent_4 {
  text-indent: 20px;
}

#bxfxwqtyqe .gt_indent_5 {
  text-indent: 25px;
}

#bxfxwqtyqe .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#bxfxwqtyqe div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="variable">Parameter</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="mean">Mean</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="median">Median</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="sd">SD</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="mad">MAD</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="q5">5th %tile</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="q95">95th %tile</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="rhat">R-hat</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="ess_bulk">ESS (bulk)</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" scope="col" id="ess_tail">ESS (tail)</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr class="gt_group_heading_row">
      <th colspan="10" class="gt_group_heading" style="font-weight: bold;" scope="colgroup" id="0/1-coded">0/1-coded</th>
    </tr>
    <tr class="gt_row_group_first"><td headers="0/1-coded  variable" class="gt_row gt_left">alpha</td>
<td headers="0/1-coded  mean" class="gt_row gt_right">-0.937</td>
<td headers="0/1-coded  median" class="gt_row gt_right">-0.937</td>
<td headers="0/1-coded  sd" class="gt_row gt_right">0.092</td>
<td headers="0/1-coded  mad" class="gt_row gt_right">0.093</td>
<td headers="0/1-coded  q5" class="gt_row gt_right">-1.088</td>
<td headers="0/1-coded  q95" class="gt_row gt_right">-0.786</td>
<td headers="0/1-coded  rhat" class="gt_row gt_right">1.001</td>
<td headers="0/1-coded  ess_bulk" class="gt_row gt_right">1359.836</td>
<td headers="0/1-coded  ess_tail" class="gt_row gt_right">2862.560</td></tr>
    <tr><td headers="0/1-coded  variable" class="gt_row gt_left">beta_a</td>
<td headers="0/1-coded  mean" class="gt_row gt_right">0.665</td>
<td headers="0/1-coded  median" class="gt_row gt_right">0.665</td>
<td headers="0/1-coded  sd" class="gt_row gt_right">0.117</td>
<td headers="0/1-coded  mad" class="gt_row gt_right">0.118</td>
<td headers="0/1-coded  q5" class="gt_row gt_right">0.472</td>
<td headers="0/1-coded  q95" class="gt_row gt_right">0.855</td>
<td headers="0/1-coded  rhat" class="gt_row gt_right">1.001</td>
<td headers="0/1-coded  ess_bulk" class="gt_row gt_right">1612.670</td>
<td headers="0/1-coded  ess_tail" class="gt_row gt_right">3084.048</td></tr>
    <tr><td headers="0/1-coded  variable" class="gt_row gt_left">beta_ab</td>
<td headers="0/1-coded  mean" class="gt_row gt_right">-0.352</td>
<td headers="0/1-coded  median" class="gt_row gt_right">-0.352</td>
<td headers="0/1-coded  sd" class="gt_row gt_right">0.137</td>
<td headers="0/1-coded  mad" class="gt_row gt_right">0.139</td>
<td headers="0/1-coded  q5" class="gt_row gt_right">-0.579</td>
<td headers="0/1-coded  q95" class="gt_row gt_right">-0.128</td>
<td headers="0/1-coded  rhat" class="gt_row gt_right">1.001</td>
<td headers="0/1-coded  ess_bulk" class="gt_row gt_right">1699.870</td>
<td headers="0/1-coded  ess_tail" class="gt_row gt_right">3617.854</td></tr>
    <tr><td headers="0/1-coded  variable" class="gt_row gt_left">beta_b</td>
<td headers="0/1-coded  mean" class="gt_row gt_right">0.969</td>
<td headers="0/1-coded  median" class="gt_row gt_right">0.968</td>
<td headers="0/1-coded  sd" class="gt_row gt_right">0.118</td>
<td headers="0/1-coded  mad" class="gt_row gt_right">0.119</td>
<td headers="0/1-coded  q5" class="gt_row gt_right">0.777</td>
<td headers="0/1-coded  q95" class="gt_row gt_right">1.164</td>
<td headers="0/1-coded  rhat" class="gt_row gt_right">1.001</td>
<td headers="0/1-coded  ess_bulk" class="gt_row gt_right">1430.007</td>
<td headers="0/1-coded  ess_tail" class="gt_row gt_right">3298.993</td></tr>
    <tr class="gt_group_heading_row">
      <th colspan="10" class="gt_group_heading" style="font-weight: bold;" scope="colgroup" id="centered">centered</th>
    </tr>
    <tr class="gt_row_group_first"><td headers="centered  variable" class="gt_row gt_left">alpha</td>
<td headers="centered  mean" class="gt_row gt_right">-0.210</td>
<td headers="centered  median" class="gt_row gt_right">-0.210</td>
<td headers="centered  sd" class="gt_row gt_right">0.047</td>
<td headers="centered  mad" class="gt_row gt_right">0.047</td>
<td headers="centered  q5" class="gt_row gt_right">-0.285</td>
<td headers="centered  q95" class="gt_row gt_right">-0.133</td>
<td headers="centered  rhat" class="gt_row gt_right">1.000</td>
<td headers="centered  ess_bulk" class="gt_row gt_right">9278.897</td>
<td headers="centered  ess_tail" class="gt_row gt_right">9114.590</td></tr>
    <tr><td headers="centered  variable" class="gt_row gt_left">gamma_a</td>
<td headers="centered  mean" class="gt_row gt_right">0.492</td>
<td headers="centered  median" class="gt_row gt_right">0.492</td>
<td headers="centered  sd" class="gt_row gt_right">0.094</td>
<td headers="centered  mad" class="gt_row gt_right">0.094</td>
<td headers="centered  q5" class="gt_row gt_right">0.337</td>
<td headers="centered  q95" class="gt_row gt_right">0.647</td>
<td headers="centered  rhat" class="gt_row gt_right">1.000</td>
<td headers="centered  ess_bulk" class="gt_row gt_right">9316.962</td>
<td headers="centered  ess_tail" class="gt_row gt_right">8853.810</td></tr>
    <tr><td headers="centered  variable" class="gt_row gt_left">gamma_ab</td>
<td headers="centered  mean" class="gt_row gt_right">-0.361</td>
<td headers="centered  median" class="gt_row gt_right">-0.362</td>
<td headers="centered  sd" class="gt_row gt_right">0.135</td>
<td headers="centered  mad" class="gt_row gt_right">0.134</td>
<td headers="centered  q5" class="gt_row gt_right">-0.582</td>
<td headers="centered  q95" class="gt_row gt_right">-0.140</td>
<td headers="centered  rhat" class="gt_row gt_right">1.000</td>
<td headers="centered  ess_bulk" class="gt_row gt_right">9167.199</td>
<td headers="centered  ess_tail" class="gt_row gt_right">9311.161</td></tr>
    <tr><td headers="centered  variable" class="gt_row gt_left">gamma_b</td>
<td headers="centered  mean" class="gt_row gt_right">0.795</td>
<td headers="centered  median" class="gt_row gt_right">0.795</td>
<td headers="centered  sd" class="gt_row gt_right">0.093</td>
<td headers="centered  mad" class="gt_row gt_right">0.095</td>
<td headers="centered  q5" class="gt_row gt_right">0.642</td>
<td headers="centered  q95" class="gt_row gt_right">0.946</td>
<td headers="centered  rhat" class="gt_row gt_right">1.001</td>
<td headers="centered  ess_bulk" class="gt_row gt_right">8969.964</td>
<td headers="centered  ess_tail" class="gt_row gt_right">8821.275</td></tr>
  </tbody>
  
</table>
</div>
<p>There are a few things to notice here. First, the Bayesian estimates for both the 0/1-coded and centered data are closer to zero than the GLM estimates above. The shrinkage is particularly large for the interaction term, because we placed much more restrictive priors on <span class="math inline">\(\beta_{ab}\)</span> and <span class="math inline">\(\gamma_{ab}\)</span>. This is what we would expect as the prior is pulling the interaction toward zero.</p>
<p>Second, if we compare the two parameterizations, we see that the R-hat—essentially a measure of whether the chains have converged to the same distribution—is slightly lower for the centered data. There isn’t much to make of the difference here (both are very close to 1), but it does suggest slightly more stable behavior for the centered parameterization.</p>
<p>The biggest impact is on the bulk effective sample size (ESS), which reflects how much independent information the chains contain after accounting for autocorrelation. Even though we ran the same number of iterations, the centered model yields far larger ESS values, indicating much better mixing. The sampler is exploring the posterior much more efficiently under the centered parameterization, and in this case the improvement is quite dramatic. Importantly, these differences have nothing to do with the models themselves since the likelihood is unchanged. Rather, it reflects how easy it is for the sampler to navigate the posterior surface when the data are centered.</p>
<p>A comparison of the trace plots reinforces the stability that centering the data provides. The traces. for the 0/1-coded data (on the left) are a bit more irregular, suggesting less efficient exploration of the posterior. In contrast, the centered parameterization produces tighter, more stable traces with less autocorrelation (on the right), indicating that the chains are mixing more effectively. This aligns with the much larger effective sample sizes observed for the centered model.</p>
<p><img src="https://i0.wp.com/www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/code_and_data/trace_plot.png?w=578&#038;ssl=1" data-recalc-dims="1" />
Finally, we compare the estimation of the log-odds ratios for the two models, just as we did before with the GLM models, and it is clear that the two Bayesian models also provide the same estimates of the contratsts:</p>
<pre>get_lor_summary &lt;- function(samp, model_name) {
  dt &lt;- as.data.table(as_draws_df(samp))
  
  if (model_name == &quot;0/1-coded&quot;) {
    dt[, lOR_A := beta_a]
    dt[, lOR_B := beta_b]
    dt[, lOR_AB := beta_a + beta_b + beta_ab]
  } else {
    dt[, lOR_A := gamma_a - 0.5 * gamma_ab]
    dt[, lOR_B := gamma_b - 0.5 * gamma_ab]
    dt[, lOR_AB := gamma_a + gamma_b]
  }
  
  dt[, .(
    mean_A = mean(lOR_A),
    mean_B = mean(lOR_B),
    mean_AB = mean(lOR_AB),
    sd_A = sd(lOR_A),
    sd_B = sd(lOR_B),
    sd_AB = sd(lOR_AB)
  )]
}

lor_01 &lt;- get_lor_summary(samp_01, &quot;0/1-coded&quot;)
lor_c  &lt;- get_lor_summary(samp_c,  &quot;centered&quot;)</pre>
<div id="xwzeufbwmd" style="padding-left:0px;padding-right:0px;padding-top:10px;padding-bottom:10px;overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>#xwzeufbwmd table {
  font-family: system-ui, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol', 'Noto Color Emoji';
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
}

#xwzeufbwmd thead, #xwzeufbwmd tbody, #xwzeufbwmd tfoot, #xwzeufbwmd tr, #xwzeufbwmd td, #xwzeufbwmd th {
  border-style: none;
}

#xwzeufbwmd p {
  margin: 0;
  padding: 0;
}

#xwzeufbwmd .gt_table {
  display: table;
  border-collapse: collapse;
  line-height: normal;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

#xwzeufbwmd .gt_caption {
  padding-top: 4px;
  padding-bottom: 4px;
}

#xwzeufbwmd .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

#xwzeufbwmd .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 3px;
  padding-bottom: 5px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

#xwzeufbwmd .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#xwzeufbwmd .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xwzeufbwmd .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

#xwzeufbwmd .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

#xwzeufbwmd .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

#xwzeufbwmd .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

#xwzeufbwmd .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

#xwzeufbwmd .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

#xwzeufbwmd .gt_spanner_row {
  border-bottom-style: hidden;
}

#xwzeufbwmd .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  text-align: left;
}

#xwzeufbwmd .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

#xwzeufbwmd .gt_from_md > :first-child {
  margin-top: 0;
}

#xwzeufbwmd .gt_from_md > :last-child {
  margin-bottom: 0;
}

#xwzeufbwmd .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

#xwzeufbwmd .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

#xwzeufbwmd .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

#xwzeufbwmd .gt_row_group_first td {
  border-top-width: 2px;
}

#xwzeufbwmd .gt_row_group_first th {
  border-top-width: 2px;
}

#xwzeufbwmd .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwzeufbwmd .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

#xwzeufbwmd .gt_first_summary_row.thick {
  border-top-width: 2px;
}

#xwzeufbwmd .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xwzeufbwmd .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwzeufbwmd .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

#xwzeufbwmd .gt_last_grand_summary_row_top {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: double;
  border-bottom-width: 6px;
  border-bottom-color: #D3D3D3;
}

#xwzeufbwmd .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

#xwzeufbwmd .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

#xwzeufbwmd .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#xwzeufbwmd .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwzeufbwmd .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

#xwzeufbwmd .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

#xwzeufbwmd .gt_left {
  text-align: left;
}

#xwzeufbwmd .gt_center {
  text-align: center;
}

#xwzeufbwmd .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

#xwzeufbwmd .gt_font_normal {
  font-weight: normal;
}

#xwzeufbwmd .gt_font_bold {
  font-weight: bold;
}

#xwzeufbwmd .gt_font_italic {
  font-style: italic;
}

#xwzeufbwmd .gt_super {
  font-size: 65%;
}

#xwzeufbwmd .gt_footnote_marks {
  font-size: 75%;
  vertical-align: 0.4em;
  position: initial;
}

#xwzeufbwmd .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

#xwzeufbwmd .gt_indent_1 {
  text-indent: 5px;
}

#xwzeufbwmd .gt_indent_2 {
  text-indent: 10px;
}

#xwzeufbwmd .gt_indent_3 {
  text-indent: 15px;
}

#xwzeufbwmd .gt_indent_4 {
  text-indent: 20px;
}

#xwzeufbwmd .gt_indent_5 {
  text-indent: 25px;
}

#xwzeufbwmd .katex-display {
  display: inline-flex !important;
  margin-bottom: 0.75em !important;
}

#xwzeufbwmd div.Reactable > div.rt-table > div.rt-thead > div.rt-tr.rt-tr-group-header > div.rt-th-group:after {
  height: 0px !important;
}
</style>
<table class="gt_table" data-quarto-disable-processing="false" data-quarto-bootstrap="false">
  <thead>
    <tr class="gt_col_headings">
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" scope="col" id="model"></th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="A">log OR A</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="B">log OR B</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1" scope="col" id="AB">log OR AB</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td headers="model" class="gt_row gt_left">0/1-coded</td>
<td headers="A" class="gt_row gt_center">0.672 (0.115)</td>
<td headers="B" class="gt_row gt_center">0.973 (0.116)</td>
<td headers="AB" class="gt_row gt_center">1.286 (0.132)</td></tr>
    <tr><td headers="model" class="gt_row gt_left">centered</td>
<td headers="A" class="gt_row gt_center">0.672 (0.116)</td>
<td headers="B" class="gt_row gt_center">0.975 (0.117)</td>
<td headers="AB" class="gt_row gt_center">1.288 (0.134)</td></tr>
  </tbody>
  
</table>
</div>
</div>
</div>
<div id="a-larger-simulation-experiment" class="section level3">
<h3>A larger simulation experiment</h3>
<p>A single data set can be misleading. So next I’ll repeat this 500 times and compare the two parameterizations across simulations. Each iteration, I generate a data set with 2000 observations, I fit each model—the one with 0/1-coding and the other with centered coding—using <code>JAGS</code>, and collect summary data of the posteriors from each model <code>JAGS</code>: mean, median, standard deviation, median absolute deviation, 5th percentile, 95th percentile, R-hat, bulk ESS, and tail ESS.</p>
<pre>one_run &lt;- function(
  n = 2000,
  truth = c(alpha = -0.8, beta_a = 0.5, beta_b = 0.9, beta_ab = -0.3),
  n_chains = 3,
  burn = 1000,
  n_iter = 3000
) {
  
  dd &lt;- s_gen(
    n = n,
    alpha = truth[&quot;alpha&quot;],
    beta_a = truth[&quot;beta_a&quot;],
    beta_b = truth[&quot;beta_b&quot;],
    beta_ab = truth[&quot;beta_ab&quot;]
  )
  
  samp_01 &lt;- fit_jags(
    dd, model_01, centered = FALSE, 
    n_chains = n_chains, burn = burn, n_iter = n_iter
  )
  
  samp_c &lt;- fit_jags(
    dd, model_c, centered = TRUE, 
    n_chains = n_chains, burn = burn, n_iter = n_iter)
  
  get_metrics &lt;- function(samp, model_name) {
    post &lt;- as_draws_df(samp)
    summ &lt;- as.data.table(summarise_draws(post))
    summ[, model := model_name]
    summ[]
  }
  
  out &lt;- rbindlist(list(
    get_metrics(samp_01, &quot;0/1-coded&quot;),
    get_metrics(samp_c,  &quot;centered&quot;)
  ))
  
  out[]
}

nsim &lt;- 500

sim_res &lt;- rbindlist(mclapply(seq_len(nsim), function(i) {
  out &lt;- one_run()
  out[, sim := i]
  out[]
}, mc.cores = 5))</pre>
<p>Earlier we saw for a single data set, there was not much difference in R-hat (essentially a measure of whether the chains have converged to the same distribution) between the two models. However, over repeated data sets, a more interesting picture emerges. The figure below shows that while <em>R-hat</em> for the 0/1-coding model is quite low, <em>R-hat</em> for the centered-coding is lower still, and much more consistent, suggesting that mixing is stronger in the centered model.</p>
<p><img src="https://i0.wp.com/www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/code_and_data/plot_rhat_V1.png?w=75%25&#038;ssl=1" alt="" data-recalc-dims="1" /></p>
<p>The next figure also confirms what we saw earlier. This shows the distribution of ratios of <em>bulk ESS</em> in the centered model compared to the 0/1-coding model. If the two models had the same effective sample size, we would expect those ratios to cluster near one. However, they are all mostly greater than five, confirming what we saw for the individual data set.</p>
<p><img src="https://i0.wp.com/www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/code_and_data/plot_ess_V1.png?w=75%25&#038;ssl=1" alt="" data-recalc-dims="1" /></p>
<p>The key issue is posterior dependence among parameters: when parameters are highly correlated, the sampler will explore narrower regions in the posterior, which slows mixing.</p>
</div>
<div id="understanding-what-is-driving-the-performance" class="section level3">
<h3>Understanding what is driving the performance</h3>
<p>To better understand this, we can look directly at the dependence structure of the posterior draws. Correlation plots (where each point is a draw from the posterior) help explain what is driving these differences in performance. Under the 0/1-coded parameterization, the posterior exhibits strong dependence among parameters. Several pairs of coefficients show substantial correlations, reflecting the fact that different combinations of parameters can produce similar fitted values. In geometric terms, the joint posterior has an elongated, highly correlated structure. This is evident in the pairwise scatter plots, where draws fall along narrow, tilted bands rather than forming roughly circular clouds.</p>
<p><img src="https://i1.wp.com/www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/code_and_data/plot_cor_01_V2.png?w=75%25&#038;ssl=1" alt="" data-recalc-dims="1" /></p>
<p>This geometry makes life difficult for the sampler. Exploring a narrower region requires smaller, correlated steps, which leads to high autocorrelation and, ultimately, low effective sample sizes.</p>
<p>In contrast, the centered parameterization produces a posterior that is nearly uncorrelated. The coefficients capture more distinct aspects of the model, and the resulting posterior is much more spherical. This greatly simplifies the exploration of the parameter space, allowing the sampler to move more freely.</p>
<p><img src="https://i1.wp.com/www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/code_and_data/plot_cor_c_V3.png?w=75%25&#038;ssl=1" alt="" data-recalc-dims="1" /></p>
<p>The key point is that centering does not change the model or the scientific conclusions. It changes the geometry of the posterior distribution, and that change can have a dramatic impact on computational performance. In effect, centering makes the parameters closer to orthogonal in the posterior, reducing interference among them and improving both statistical and computational behavior.</p>
<p>In the ED-LEAD study, where we are fitting hierarchical factorial models with multiple intervention components, this shift in parameterization is critical. Centering the treatment indicators leads to more stable estimation and far more efficient sampling, which is particularly important given our reliance on <code>JAGS</code>. Unlike Hamiltonian Monte Carlo (as implemented in <code>Stan</code>), which can handle correlated posteriors more effectively, the Gibbs and Metropolis-based updates used by <code>JAGS</code> are much more sensitive to posterior dependence. Improving the geometry of the posterior seems to be critical for good performance in this setting.</p>
<p>
<p><small><font color="darkkhaki">
Support:</p>
This work was supported in part by the National Institute on Aging (NIA) of the National Institutes of Health under Award Number U19AG078105, which funds the <em>Emergency departments leading the transformation of Alzheimer’s and dementia care</em> (ED-LEAD) study. The author, the leader of the Statistics Analysis Core, was the sole writer of this blog post and has no conflicts. The content is solely the responsibility of the author and does not necessarily represent the official views of the National Institutes of Health.
</font></small>
</p>
</div>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.rdatagen.net/post/2026-03-31-centering-binary-predictors-can-improve-bayesian-computation/"> ouR data generation</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/same-model-better-shape-why-centering-improves-mcmc/">Same model, better shape: why centering improves MCMC</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400210</post-id>	</item>
		<item>
		<title>Better Git diff with difftastic</title>
		<link>https://www.r-bloggers.com/2026/03/better-git-diff-with-difftastic/</link>
		
		<dc:creator><![CDATA[Maëlle&#039;s R blog on Maëlle Salmon&#039;s personal website]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://masalmon.eu/2026/03/30/difftastic/</guid>

					<description><![CDATA[<p>I’m currently on a quest to better know and understand treesitter-based tooling for R.<br />
To make it short, treesitter is a tool for parsing code, for instance recognizing what is a function, an argument, a logical in a string of code.<br />
With tools bu...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/better-git-diff-with-difftastic/">Better Git diff with difftastic</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://masalmon.eu/2026/03/30/difftastic/"> Maëlle&#039;s R blog on Maëlle Salmon&#039;s personal website</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>I’m currently on a quest to better know and understand treesitter-based tooling for R.
To make it short, treesitter is a tool for parsing code, for instance recognizing what is a function, an argument, a logical in a string of code.
With tools built upon treesitter you can <a href="https://emilhvitfeldt.com/post/ast-grep-r-claude-code/" rel="nofollow" target="_blank">search</a>, <a href="https://posit-dev.github.io/air/" rel="nofollow" target="_blank">reformat</a>, <a href="https://jarl.etiennebacher.com/" rel="nofollow" target="_blank">lint and fix</a>, etc. your code.
Exciting stuff, running locally and deterministically on your machine.</p>
<p>Speaking of “etc.”, <a href="https://www.etiennebacher.com/" rel="nofollow" target="_blank">Etienne Bacher</a> helpfully suggested I also look at treesitter-based tooling for <em>other languages</em> to see what’s still missing in our ecosystem.
This is how I stumbled upon difftastic by Wilfred Hughes, “a structural diff tool that understands syntax”. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/2728.png" alt="✨" class="wp-smiley" style="height: 1em; max-height: 1em;" />
This means that difftastic doesn’t only compare line or “words” but actual syntax by looking at lines around the lines that changed (by default, 3),
Even better, it understands R out of the box<sup id="fnref:1"><a href="https://masalmon.eu/2026/03/30/difftastic/#fn:1" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">1</a></sup>.</p>
<p><em>Many thanks to Etienne Bacher not only for making me discover difftastic but also for useful feedback on this post!</em></p>
<h2 id="installing-difftastic">Installing difftastic</h2>
<p>To install difftastic I downloaded a binary file for my system from the releases of the GitHub repository,
as <a href="https://difftastic.wilfred.me.uk/installation.html" rel="nofollow" target="_blank">documented in the manual</a>.</p>
<h2 id="difftastic-on-two-files">difftastic on two files</h2>
<p>You can run difftastic on two files, a bit like you would use the <a href="https://waldo.r-lib.org/" rel="nofollow" target="_blank">waldo</a> R package on two objects.</p>
<p>Let’s compare:</p>
<pre>a &lt;- gsub(&quot;bad&quot;, &quot;good&quot;, x)
</pre><p>to</p>
<pre>a &lt;- stringr::str_replace(x, &quot;bad&quot;, &quot;good&quot;)
</pre><p>respectedly saved in <code>old.R</code> and <code>new.R</code>.
The CLI is called difft not difftastic.
I use the “inline” display rather than the two columns default in order to save horizontal space.</p>
<pre>difft old.R new.R --display inline
</pre><p>We’d get to this nice looking diff:</p>
<figure>
    <img src="https://i2.wp.com/masalmon.eu/2026/03/30/difftastic/oldnew.png?w=578&#038;ssl=1"
         alt="diff of the two lines of code, where &#39;gsub&#39; and &#39;, x&#39; are in red then &#39;strinrr::str_replace&#39; and &#39;x&#39; in green" data-recalc-dims="1"/> 
</figure>

<p>The parentheses and <code>&quot;bad&quot;</code> and <code>&quot;good&quot;</code> arguments are ignored.</p>
<p>We can also get the JSON version of this diff, which is an unstable feature which usage requires setting an environment variable:</p>
<pre>export DFT_UNSTABLE=yes
difft old.R new.R --display json
</pre><p>This gets us</p>
<pre>{&quot;aligned_lines&quot;:[[0,0],[1,1]],&quot;chunks&quot;:[[{&quot;lhs&quot;:{&quot;line_number&quot;:0,&quot;changes&quot;:[{&quot;start&quot;:5,&quot;end&quot;:9,&quot;content&quot;:&quot;gsub&quot;,&quot;highlight&quot;:&quot;normal&quot;},{&quot;start&quot;:23,&quot;end&quot;:24,&quot;content&quot;:&quot;,&quot;,&quot;highlight&quot;:&quot;normal&quot;},{&quot;start&quot;:25,&quot;end&quot;:26,&quot;content&quot;:&quot;x&quot;,&quot;highlight&quot;:&quot;normal&quot;}]},&quot;rhs&quot;:{&quot;line_number&quot;:0,&quot;changes&quot;:[{&quot;start&quot;:5,&quot;end&quot;:12,&quot;content&quot;:&quot;stringr&quot;,&quot;highlight&quot;:&quot;normal&quot;},{&quot;start&quot;:12,&quot;end&quot;:14,&quot;content&quot;:&quot;::&quot;,&quot;highlight&quot;:&quot;keyword&quot;},{&quot;start&quot;:14,&quot;end&quot;:25,&quot;content&quot;:&quot;str_replace&quot;,&quot;highlight&quot;:&quot;normal&quot;},{&quot;start&quot;:26,&quot;end&quot;:27,&quot;content&quot;:&quot;x&quot;,&quot;highlight&quot;:&quot;normal&quot;},{&quot;start&quot;:27,&quot;end&quot;:28,&quot;content&quot;:&quot;,&quot;,&quot;highlight&quot;:&quot;normal&quot;}]}}]],&quot;language&quot;:&quot;R&quot;,&quot;path&quot;:&quot;content/post/2026-03-26-difftastic/new.R&quot;,&quot;status&quot;:&quot;changed&quot;}
</pre><p>Now, none of this isn’t very useful because I would never compare files in this way…
I use version control!</p>
<h2 id="difftastic-with-git">difftastic with Git</h2>
<p>We can set difftastic as the external diff tool for Git globally or for the current project.</p>
<p>For instance with the gert R package, to set it locally:</p>
<pre>gert::git_config_set(&quot;diff.external&quot;, &quot;difft&quot;)
</pre><p>If I want to use the inline display I’d set:</p>
<pre>gert::git_config_set(&quot;diff.external&quot;, &quot;difft --display inline&quot;)
</pre><p>Then <code>git diff</code> will by default use difftastic.
Most interestingly for me, <code>git show --ext-diff</code> will use difftastic.
I never use <code>git diff</code> directly but I do look at more or less recent commits a lot.</p>
<p>Say I am interested in the <a href="https://github.com/r-lib/roxygen2/commit/7a1dd39866699a2b0a034bb15244c07698a1e2e7" rel="nofollow" target="_blank">commit</a> that removed roxygen2’s dependency on stringi, I’ll run:</p>
<pre>git show 7a1dd39866699a2b0a034bb15244c07698a1e2e7 --ext-diff
</pre><p>and get:</p>
<figure>
    <img src="https://i0.wp.com/masalmon.eu/2026/03/30/difftastic/strwrap.png?w=578&#038;ssl=1"
         alt="diff where the parentheses of a nested call are nicely highlighted" data-recalc-dims="1"/> 
</figure>

<p>This isn’t spectacular because this is a small diff, but I enjoy the highlighting of the parentheses of the removed nested call, and of the logical.</p>
<h2 id="cool-features-of-difftastic">Cool features of difftastic</h2>
<p>Building on two examples of the <a href="https://difftastic.wilfred.me.uk/" rel="nofollow" target="_blank">difftastic homepage</a>…</p>
<h3 id="ignoring-formatting-changes">Ignoring formatting changes</h3>
<p>Since formatters can so helpfully apply your formatting preferences,
reviewing formatting changes in a patch that’s about something else entirely is useless and annoying.
Imagine having a function definition that fits on a single line, then adding one argument to it.</p>
<p>Going from</p>
<pre>f &lt;- function(myarg1 = foo, myarg2 = bar) {}
</pre><p>to</p>
<pre>f &lt;- function(
  myarg1 = foo,
  myarg2 = bar,
  myarg3 = baz
) {}
</pre><p>Because the definition is now longer than 80 characters, your formatter might switch the definition to be on multiple lines.
But the actually interesting change is the addition of one argument.</p>
<p>Native Git diff<sup id="fnref:2"><a href="https://masalmon.eu/2026/03/30/difftastic/#fn:2" class="footnote-ref" role="doc-noteref" rel="nofollow" target="_blank">2</a></sup> would show:</p>
<figure>
    <img src="https://i0.wp.com/masalmon.eu/2026/03/30/difftastic/args.png?w=578&#038;ssl=1"
         alt="diff where all lines are highlighted because the function was reformatted, not only complemented with one argument" data-recalc-dims="1"/> 
</figure>

<p>Git with difftastic would show:</p>
<figure>
    <img src="https://i2.wp.com/masalmon.eu/2026/03/30/difftastic/args-better.png?w=578&#038;ssl=1"
         alt="diff where only the comma after `bar` and the line with the new argument are highlighted" data-recalc-dims="1"/> 
</figure>

<p>The matching of delimiters is why I found the difftastic’s display of the roxygen2 commit more pleasing.</p>
<h3 id="matching-delimiters-in-wrappers">Matching delimiters in wrappers</h3>
<p>The Git diff can look a bit ugly when you simply move code from one function to the other.</p>
<p>Say we go from</p>
<pre>f &lt;- function() {
  1 + 1
}

</pre><p>to</p>
<pre>f &lt;- function() {
  g()
}

g &lt;- function() {
  1 + 1
}

</pre><p>Git diff would show:</p>
<figure>
    <img src="https://i1.wp.com/masalmon.eu/2026/03/30/difftastic/wrappers-bad.png?w=578&#038;ssl=1"
         alt="uncool diff that shows lines modified in both the wrapper and the function without matching delimiters" data-recalc-dims="1"/> 
</figure>

<p>Whereas Git with difftastic would show:</p>
<figure>
    <img src="https://i1.wp.com/masalmon.eu/2026/03/30/difftastic/wrappers-good.png?w=578&#038;ssl=1"
         alt="cool diff that shows `g` as a new function by highlighting its name and the left arrow, whereas the entire definiton of `f` is marked as changed." data-recalc-dims="1"/> 
</figure>

<h2 id="will-i-use-difftastic">Will I use difftastic?</h2>
<p>I really like the concept behind difftastic and the few Git commits I looked at with it rendered nicely.
Now, what’s <a href="https://github.com/Wilfred/difftastic#does-difftastic-integrate-with-my-favourite-tool" rel="nofollow" target="_blank">missing</a> for me to use difftastic a lot is its integration with the tools where I actually use Git:</p>
<ul>
<li>Positron including the GitLens extension;</li>
<li>GitHub Pull Request Files tab.</li>
</ul>
<p>In any case, I’ll continue learning about tools based on treesitter, some of which like <a href="https://posit-dev.github.io/air/" rel="nofollow" target="_blank">Air</a> and <a href="https://jarl.etiennebacher.com/" rel="nofollow" target="_blank">Jarl</a> I can already use directly from my IDE. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f638.png" alt="😸" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<section class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1" role="doc-endnote">
<p>It’s not every day we R developers look at the <a href="https://difftastic.wilfred.me.uk/" rel="nofollow" target="_blank">homepage</a> of a tool and see the R logo among the logos of other languages! <a href="https://masalmon.eu/2026/03/30/difftastic/#fnref:1" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>To get the diff that Git would show me I ran <code>git diff --no-index old-args.R new-args.R --no-ext-diff</code>, cool trick I didn’t know about! Very glad I didn’t have to create a fake Git repo just for this. (<code>--no-ext-diff</code> because my diff in this repo would use difftastic by default!) <a href="https://masalmon.eu/2026/03/30/difftastic/#fnref:2" class="footnote-backref" role="doc-backlink" rel="nofollow" target="_blank"><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/21a9.png" alt="↩" class="wp-smiley" style="height: 1em; max-height: 1em;" />︎</a></p>
</li>
</ol>
</section>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://masalmon.eu/2026/03/30/difftastic/"> Maëlle&#039;s R blog on Maëlle Salmon&#039;s personal website</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/better-git-diff-with-difftastic/">Better Git diff with difftastic</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400178</post-id>	</item>
		<item>
		<title>rOpenSci News Digest, March 2026</title>
		<link>https://www.r-bloggers.com/2026/03/ropensci-news-digest-march-2026/</link>
		
		<dc:creator><![CDATA[rOpenSci]]></dc:creator>
		<pubDate>Mon, 30 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://ropensci.org/blog/2026/03/30/news-mars-2026/</guid>

					<description><![CDATA[<p>Dear rOpenSci friends, it’s time for our monthly news roundup!  You can read this post on our blog. Now let’s dive into the activity at and around rOpenSci!</p>
<p>rOpenSci HQ</p>
<p>rOpenSci Dev Guide 1.0.0: Trilingual and Improved<br />
rOpenSci Software...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/ropensci-news-digest-march-2026/">rOpenSci News Digest, March 2026</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://ropensci.org/blog/2026/03/30/news-mars-2026/"> rOpenSci - open tools for open science</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<!-- Before sending DELETE THE INDEX_CACHE and re-knit! -->
<p>Dear rOpenSci friends, it’s time for our monthly news roundup! <!-- blabla --> You can read this post <a href="https://ropensci.org/blog/2026/03/30/news-mars-2026" rel="nofollow" target="_blank">on our blog</a>. Now let’s dive into the activity at and around rOpenSci!</p>
<h2>
rOpenSci HQ
</h2><h3>
rOpenSci Dev Guide 1.0.0: Trilingual and Improved
</h3><p>rOpenSci Software Peer Review’s guidance is gathered in an online book that keeps improving! It is now available in <a href="https://devguide.ropensci.org/" rel="nofollow" target="_blank">English</a>, <a href="https://devguide.ropensci.org/es/index.es.html" rel="nofollow" target="_blank">Spanish</a> and <a href="https://devguide.ropensci.org/pt/index.pt.html" rel="nofollow" target="_blank">Portuguese</a>. Read more in the <a href="https://ropensci.org/blog/2026/03/02/devguide-1.0.0/" rel="nofollow" target="_blank">release announcement</a></p>
<h3>
Champions Program Update
</h3><p>We are still going through the Champions selection process, and we’re excited to share that the new group of mentors has already been selected and is now actively reviewing Champions applications.</p>
<p>This cohort brings together a wonderful mix of returning Champions stepping into mentorship roles, mentors continuing their contributions, and new members joining the program. The 2026 mentors are Andrea Gómez Vargas, Pablo Paccioretti, Alber Hamersson Sánchez Ipia, Erick Isaac Navarro Delgado, Francisco Cardozo, Luis Verde Arregoitia, Monika Ávila Márquez, Guadalupe Pascal, Pao Corrales, and Elio Campitelli. Together, they represent a diverse and vibrant community across Colombia, Mexico, Argentina, Brazil, and Bolivia, with some currently based in Switzerland, Canada, the United States, and Australia. We’re very happy to see this growing, interconnected network supporting the next cohort of Champions.</p>
<h3>
R-Universe update
</h3><p>You can now download artifacts and log files from R-Universe without being logged in with a GitHub account, for example <a href="https://ropensci.r-universe.dev/opencv#checktable" rel="nofollow" target="_blank">https://ropensci.r-universe.dev/opencv#checktable</a>.</p>
<h3>
Software review and usage of AI tools
</h3><p>Authors submitting new software for <a href="https://ropensci.org/software-review/" rel="nofollow" target="_blank">peer review</a> are now required to explain potential usage of generative AI tools in their package development. All submission templates now include a mandatory check-box:</p>
<pre>- [ ] Generative AI tools were used to produce some of the material in this submission.
If so, please describe usage, and include links to any relevant aspects of your repository.
</pre>
<p>This is the start of our updates to accommodate generative AI tools in package development, as described in our <a href="https://ropensci.org/blog/2026/02/26/ropensci-ai-policy/" rel="nofollow" target="_blank">recent blog post</a>. The next phase will involve updates to our <a href="https://devguide.ropensci.org/" rel="nofollow" target="_blank"><em>Dev Guide</em></a>, explaining requirements and recommendations for authors, reviewers, and editors. All updates are intended to permit generative AI tools to be used in any useful way, while minimising the burden on those who volunteer their own time to keep our software peer review service running.</p>
<h3>
Software review bot updates
</h3><p>The <code>ropensci-review-bot</code> now delivers an initial report to all new software pre-submissions and submissions, identifying the five most similar packages from both all rOpenSci packages, and all CRAN packages. The matches are generated by our <a href="https://docs.ropensci.org/pkgmatch" rel="nofollow" target="_blank">ropensci-review-tools/pkgmatch package</a> (itself reviewed in <a href="https://github.com/ropensci/software-review/issues/671" rel="nofollow" target="_blank">this review issue</a>). Matching is based on an <a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf" rel="nofollow" target="_blank">“term frequency-inverse document frequency” algorithm</a>, using inverse document frequencies from all rOpenSci and CRAN packages. Similar package reports can also be manually triggered (by editors only) with <code>@ropensci-review-bot similar packages</code>, like in <a href="https://github.com/ropensci/software-review/issues/671#issuecomment-4117805740" rel="nofollow" target="_blank">this example for the pkgmatch package itself</a>.</p>
<h3>
Coworking
</h3><p>Read <a href="https://ropensci.org/blog/2023/06/21/coworking/" rel="nofollow" target="_blank">all about coworking</a>!</p>
<ul>
<li>Tuesday April 7th 2026, 9:00 Americas Pacific (16:00 UTC) <a href="https://ropensci.org/events/coworking-2026-04/" rel="nofollow" target="_blank">“Getting to know the CSID Network”</a> with <a href="https://ropensci.org/author/steffi-lazerte/" rel="nofollow" target="_blank">Steffi LaZerte</a> and cohosts <a href="https://ropensci.org/author/irene-ramos/" rel="nofollow" target="_blank">Irene Ramos</a> and <a href="https://ropensci.org/author/adamu-saleh-saidu" rel="nofollow" target="_blank">Adamu Saleh Saidu</a>.
<ul>
<li>Learn more about the <a href="https://csidnet.org/" rel="nofollow" target="_blank">CSID Network</a></li>
<li>Meet cohosts, Irene Ramos and Adamu Saleh Saidu, and learn more about the CSID Network and how you might get involved.</li>
</ul>
</li>
<li>Tuesday May 5th 2026, 9:00 Australia Western (01:00 UTC) <a href="https://ropensci.org/events/coworking-2026-05/" rel="nofollow" target="_blank">“Code Review with rOpenSci”</a> with <a href="https://ropensci.org/author/steffi-lazerte/" rel="nofollow" target="_blank">Steffi LaZerte</a> and cohost <a href="https://ropensci.org/author/liz-hare/" rel="nofollow" target="_blank">Liz Hare</a>.
<ul>
<li>Explore resources for Code Review</li>
<li>Sign up to volunteer to do <a href="https://airtable.com/app8dssb6a7PG6Vwj/shrnfDI2S9uuyxtDw" rel="nofollow" target="_blank">software peer-review</a> at rOpenSci</li>
<li>Meet cohost, Liz Hare, and discuss resources for Code Review with rOpenSci.</li>
</ul>
</li>
</ul>
<p>And remember, you can always cowork independently on work related to R, work on packages that tend to be neglected, or work on what ever you need to get done!</p>
<h2>
Software <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f4e6.png" alt="📦" class="wp-smiley" style="height: 1em; max-height: 1em;" />
</h2><h3>
New packages
</h3><p>The following package recently became a part of our software suite:</p>
<ul>
<li><a href="https://docs.ropensci.org/suwo" rel="nofollow" target="_blank">suwo</a>, developed by Marcelo Araya-Salas together with Jorge Elizondo-Calvo and Alejandro Rico-Guevara: Streamline searching/downloading of nature media files (e.g. audios, photos) from online repositories. The package offers functions for obtaining media metadata from online repositories, downloading associated media files and updating data sets with new records. It has been <a href="https://github.com/ropensci/software-review/issues/729" rel="nofollow" target="_blank">reviewed</a> by Eric R. Scott and Hugo Gruson.</li>
</ul>
<p>Discover <a href="https://ropensci.org/packages" rel="nofollow" target="_blank">more packages</a>, read more about <a href="https://ropensci.org/software-review" rel="nofollow" target="_blank">Software Peer Review</a>.</p>
<h3>
New versions
</h3><p>The following eleven packages have had an update since the last newsletter: <a href="https://docs.ropensci.org/cffr" title="Generate Citation File Format (cff) Metadata for R Packages" rel="nofollow" target="_blank">cffr</a> (<a href="https://github.com/ropensci/cffr/releases/tag/v1.3.0" rel="nofollow" target="_blank"><code>v1.3.0</code></a>), <a href="https://docs.ropensci.org/pkgmatch" title="Find R Packages Matching Either Descriptions or Other R Packages" rel="nofollow" target="_blank">pkgmatch</a> (<a href="https://github.com/ropensci-review-tools/pkgmatch/releases/tag/v0.5.2" rel="nofollow" target="_blank"><code>v0.5.2</code></a>), <a href="https://docs.ropensci.org/tarchetypes" title="Archetypes for Targets" rel="nofollow" target="_blank">tarchetypes</a> (<a href="https://github.com/ropensci/tarchetypes/releases/tag/0.14.1" rel="nofollow" target="_blank"><code>0.14.1</code></a>), <a href="https://docs.ropensci.org/rgbif" title="Interface to the Global Biodiversity Information Facility API" rel="nofollow" target="_blank">rgbif</a> (<a href="https://github.com/ropensci/rgbif/releases/tag/v3.8.5" rel="nofollow" target="_blank"><code>v3.8.5</code></a>), <a href="https://docs.ropensci.org/saperlipopette" title="Create Example Git Messes" rel="nofollow" target="_blank">saperlipopette</a> (<a href="https://github.com/ropensci-training/saperlipopette/releases/tag/v0.1.1" rel="nofollow" target="_blank"><code>v0.1.1</code></a>), <a href="https://docs.ropensci.org/gutenbergr" title="Download and Process Public Domain Works from Project Gutenberg" rel="nofollow" target="_blank">gutenbergr</a> (<a href="https://github.com/ropensci/gutenbergr/releases/tag/v0.5.0" rel="nofollow" target="_blank"><code>v0.5.0</code></a>), <a href="https://docs.ropensci.org/trud" title="Query the NHS TRUD API" rel="nofollow" target="_blank">trud</a> (<a href="https://github.com/ropensci/trud/releases/tag/v0.2.1" rel="nofollow" target="_blank"><code>v0.2.1</code></a>), <a href="https://docs.ropensci.org/naijR" title="Operations to Ease Data Analyses Specific to Nigeria" rel="nofollow" target="_blank">naijR</a> (<a href="https://github.com/ropensci/naijR/releases/tag/v0.7.0" rel="nofollow" target="_blank"><code>v0.7.0</code></a>), <a href="https://docs.ropensci.org/sasquatch" title="Use SAS, R, and quarto Together" rel="nofollow" target="_blank">sasquatch</a> (<a href="https://github.com/ropensci/sasquatch/releases/tag/v0.1.3" rel="nofollow" target="_blank"><code>v0.1.3</code></a>), <a href="https://docs.ropensci.org/lingtypology" title="Linguistic Typology and Mapping" rel="nofollow" target="_blank">lingtypology</a> (<a href="https://github.com/ropensci/lingtypology/releases/tag/v1.1.25" rel="nofollow" target="_blank"><code>v1.1.25</code></a>), and <a href="https://docs.ropensci.org/rerddap" title="General Purpose Client for ERDDAP&#x2122; Servers" rel="nofollow" target="_blank">rerddap</a> (<a href="https://github.com/ropensci/rerddap/releases/tag/v1.2.3" rel="nofollow" target="_blank"><code>v1.2.3</code></a>).</p>
<p>Post on dfms release: <a href="https://sebkrantz.github.io/Rblog/2026/01/29/releasing-dfms-1-0-fast-and-feature-rich-estimation-of-dynamic-factor-models-in-r/" rel="nofollow" target="_blank">Releasing dfms 1.0: Fast and Feature-Rich Estimation of Dynamic Factor Models in R</a>.</p>
<h2>
Software Peer Review
</h2><p>There are fifteen recently closed and active submissions and 5 submissions on hold. Issues are at different stages:</p>
<ul>
<li>
<p>One at <a href="https://github.com/ropensci/software-review/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%226/approved%22" rel="nofollow" target="_blank">‘6/approved’</a>:</p>
<ul>
<li><a href="https://github.com/ropensci/software-review/issues/729" rel="nofollow" target="_blank">suwo</a>, Access Nature Media Repositories Through R. Submitted by <a href="https://marce10.github.io/" rel="nofollow" target="_blank">Marcelo Araya-Salas</a>.</li>
</ul>
</li>
<li>
<p>One at <a href="https://github.com/ropensci/software-review/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%225/awaiting-reviewer(s)-response%22" rel="nofollow" target="_blank">‘5/awaiting-reviewer(s)-response’</a>:</p>
<ul>
<li><a href="https://github.com/ropensci/software-review/issues/671" rel="nofollow" target="_blank">pkgmatch</a>, Find R Packages Matching Either Descriptions or Other R Packages. Submitted by <a href="https://mpadge.github.io/" rel="nofollow" target="_blank">mark padgham</a>.</li>
</ul>
</li>
<li>
<p>Two at <a href="https://github.com/ropensci/software-review/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%224/review(s)-in-awaiting-changes%22" rel="nofollow" target="_blank">‘4/review(s)-in-awaiting-changes’</a>:</p>
<ul>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/741" rel="nofollow" target="_blank">logolink</a>, An Interface for Running NetLogo Simulations. Submitted by <a href="http://danielvartan.com/" rel="nofollow" target="_blank">Daniel Vartanian</a>.</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/615" rel="nofollow" target="_blank">galamm</a>, Generalized Additive Latent and Mixed Models. Submitted by <a href="https://osorensen.github.io/" rel="nofollow" target="_blank">Øystein Sørensen</a>. (Stats).</p>
</li>
</ul>
</li>
<li>
<p>Six at <a href="https://github.com/ropensci/software-review/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%223/reviewer(s)-assigned%22" rel="nofollow" target="_blank">‘3/reviewer(s)-assigned’</a>:</p>
<ul>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/760" rel="nofollow" target="_blank">pvEBayes</a>, Empirical Bayes Methods for Pharmacovigilance. Submitted by <a href="https://github.com/YihaoTancn" rel="nofollow" target="_blank">Yihao Tan</a>. (Stats).</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/754" rel="nofollow" target="_blank">saperlipopette</a>, Create Example Git Messes. Submitted by <a href="https://masalmon.eu/" rel="nofollow" target="_blank">Maëlle Salmon</a>.</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/730" rel="nofollow" target="_blank">ernest</a>, A Toolkit for Nested Sampling. Submitted by <a href="https://github.com/kylesnap" rel="nofollow" target="_blank">Kyle Dewsnap</a>. (Stats).</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/718" rel="nofollow" target="_blank">rcrisp</a>, Automate the Delineation of Urban River Spaces. Submitted by <a href="https://github.com/cforgaci" rel="nofollow" target="_blank">Claudiu Forgaci</a>. (Stats).</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/709" rel="nofollow" target="_blank">reviser</a>, Tools for Studying Revision Properties in Real-Time Time Series Vintages. Submitted by <a href="https://marcburri.github.io/" rel="nofollow" target="_blank">Marc Burri</a>.</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/704" rel="nofollow" target="_blank">priorsense</a>, Prior Diagnostics and Sensitivity Analysis. Submitted by <a href="https://github.com/n-kall" rel="nofollow" target="_blank">Noa Kallioinen</a>. (Stats).</p>
</li>
</ul>
</li>
<li>
<p>Two at <a href="https://github.com/ropensci/software-review/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%222/seeking-reviewer(s)%22" rel="nofollow" target="_blank">‘2/seeking-reviewer(s)’</a>:</p>
<ul>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/750" rel="nofollow" target="_blank">nycOpenData</a>, Convenient Access to NYC Open Data API Endpoints. Submitted by <a href="https://github.com/martinezc1" rel="nofollow" target="_blank">Christian Martinez</a>.</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/743" rel="nofollow" target="_blank">RAMEN</a>, RAMEN: Regional Association of Methylome variability with the Exposome and geNome. Submitted by <a href="https://erick-navarrodelgado.netlify.app/" rel="nofollow" target="_blank">Erick Navarro-Delgado</a>.</p>
</li>
</ul>
</li>
<li>
<p>Three at <a href="https://github.com/ropensci/software-review/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%221/editor-checks%22" rel="nofollow" target="_blank">‘1/editor-checks’</a>:</p>
<ul>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/744" rel="nofollow" target="_blank">RAQSAPI</a>, A Simple Interface to the US EPA Air Quality System Data Mart API. Submitted by <a href="https://github.com/mccroweyclinton-EPA" rel="nofollow" target="_blank">mccroweyclinton-EPA</a>.</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/740" rel="nofollow" target="_blank">fcmconfr</a>, Fuzzy Cognitive Map Analysis in R. Submitted by <a href="https://github.com/bhroston" rel="nofollow" target="_blank">benroston</a>. (Stats).</p>
</li>
<li>
<p><a href="https://github.com/ropensci/software-review/issues/717" rel="nofollow" target="_blank">coevolve</a>, Fit Bayesian Generalized Dynamic Phylogenetic Models using Stan. Submitted by <a href="https://scottclaessens.github.io/" rel="nofollow" target="_blank">Scott Claessens</a>. (Stats).</p>
</li>
</ul>
</li>
</ul>
<p>Find out more about <a href="https://ropensci.org/software-review" rel="nofollow" target="_blank">Software Peer Review</a> and how to get involved.</p>
<h2>
On the blog
</h2><!-- Do not forget to rebase your branch! -->
<h3>
Software Review
</h3><ul>
<li>
<p><a href="https://ropensci.org/blog/2026/02/26/ropensci-ai-policy" rel="nofollow" target="_blank">Software Review in the Era of AI: What We Are Testing at rOpenSci</a> by Mark Padgham, Noam Ross, Maëlle Salmon, Yanina Bellini Saibene, Mauro Lepore, Emily Riederer, Jouni Helske, and Francisco Rodriguez-Sanchez. rOpenSci is testing preliminary policies on the use of generative AI tools, with proposed updates to documentation and procedures for authors submitting software for review, for editors, and for reviewers.</p>
</li>
<li>
<p><a href="https://ropensci.org/blog/2026/03/02/devguide-1.0.0" rel="nofollow" target="_blank">rOpenSci Dev Guide 1.0.0: Trilingual and Improved</a> by Maëlle Salmon, Mark Padgham, and Noam Ross. Updates in version 1.0.0 of the online book ‘rOpenSci Packages: Development, Maintenance, and Peer Review’. Other languages: <a href='https://ropensci.org/es/blog/2026/03/02/r_open_sci_dev_guide_1_0_0_triling%C3%BCe_y_mejorada' lang='es' rel="nofollow" target="_blank">rOpenSci Dev Guide 1.0.0: Trilingüe y mejorada (es)</a>, <a href='https://ropensci.org/pt/blog/2026/03/02/guia_de_desenvolvimento_da_r_open_sci_1_0_0_tril%C3%ADngue_e_aprimorado' lang='pt' rel="nofollow" target="_blank">Guia de desenvolvimento da rOpenSci 1.0.0: trilíngue e aprimorado (pt)</a>.</p>
</li>
</ul>
<figure class="center"><img src="https://i2.wp.com/ropensci.org/blog/2026/03/30/news-mars-2026/cover.png?w=400&#038;ssl=1"
alt="cover of rOpenSci dev guide, showing a package production line with small humans discussing, examining and promoting packages"  data-recalc-dims="1">
</figure>
<ul>
<li><a href="https://ropensci.org/blog/2026/03/10/patentsview-breaking-release" rel="nofollow" target="_blank">Breaking Release of the patentsview R Package</a> by Russ Allen and Chris Baker. Breaking Release of the patentsview R Package.</li>
</ul>
<h2>
Calls for contributions
</h2><h3>
Calls for maintainers
</h3><p>If you’re interested in maintaining any of the R packages below, you might enjoy reading our blog post <a href="https://ropensci.org/blog/2023/02/07/what-does-it-mean-to-maintain-a-package/" rel="nofollow" target="_blank">What Does It Mean to Maintain a Package?</a>.</p>
<ul>
<li>
<p><a href="https://docs.ropensci.org/NLMR" rel="nofollow" target="_blank">NLMR</a>, R package to simulate neutral landscape models. <a href="https://github.com/ropensci/NLMR/issues/116" rel="nofollow" target="_blank">Issue for volunteering</a>.</p>
</li>
<li>
<p><a href="https://docs.ropensci.org/landscapetools" rel="nofollow" target="_blank">landscapetools</a>, R package for some of the less-glamorous tasks involved in landscape analysis. <a href="https://github.com/ropensci/landscapetools/issues/48" rel="nofollow" target="_blank">Issue for volunteering</a>.</p>
</li>
<li>
<p><a href="https://docs.ropensci.org/hddtools" rel="nofollow" target="_blank">hddtools</a>, Tools to discover hydrological data, accessing catalogues and databases from various data providers. <a href="https://github.com/ropensci/hddtools/issues/36" rel="nofollow" target="_blank">Issue for volunteering</a>.</p>
</li>
<li>
<p><a href="https://docs.ropensci.org/qualtRics/" rel="nofollow" target="_blank">qualtRics</a>, download Qualtrics survey data. <a href="https://github.com/ropensci/qualtRics/issues/383" rel="nofollow" target="_blank">Issue for volunteering</a>.</p>
</li>
</ul>
<h3>
Calls for contributions
</h3><p>Refer to our <a href="https://ropensci.org/help-wanted/" rel="nofollow" target="_blank">help wanted page</a> – before opening a PR, we recommend asking in the issue whether help is still needed.</p>
<h2>
Package development corner
</h2><p>Some useful tips for R package developers. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f440.png" alt="👀" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h3>
A new R core member!
</h3><p>The R Foundation announced that <a href="https://uk.linkedin.com/in/heathrturnr" rel="nofollow" target="_blank">Heather Turner</a> has joined the <a href="https://www.r-project.org/contributors.html" rel="nofollow" target="_blank">R Core Team</a>! <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f389.png" alt="🎉" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
<h3>
How to browse the R mailing lists
</h3><p>The <a href="https://www.r-project.org/mail.html" rel="nofollow" target="_blank">official mailing lists of the R project</a> like <a href="https://blog.r-hub.io/2019/04/11/r-package-devel/" rel="nofollow" target="_blank">R-package-devel</a> are full of important and useful information. How to browse them, given that the default website is not easy to search? You can use the <a href="https://mail-archive.com/r-devel@r-project.org/" rel="nofollow" target="_blank">mail-archive</a> website (thanks to Hugo Gruson for the reminder!) or a new project by James Balamuta: the <a href="https://r-mailing-lists.thecoatlessprofessor.com/" rel="nofollow" target="_blank">R Mailing Lists Archive</a>!</p>
<h3>
“Claude Code: Setting up ast-grep with R support”
</h3><p>Thanks to Mauro Lepore for sharing this blog post by Emil Hvitfeldt: <a href="https://emilhvitfeldt.com/post/ast-grep-r-claude-code/" rel="nofollow" target="_blank">“Claude Code: Setting up ast-grep with R support”</a>. ast-grep is a tool for querying code by syntax rather than brittle regular expressions. The blog post describes how to add R support to this tool, and how to take advantage of it when using Claude.</p>
<h3>
On muffling messages from packages
</h3><p>A follow-up on our post <a href="https://ropensci.org/blog/2024/02/06/verbosity-control-packages/" rel="nofollow" target="_blank">“Please Shut Up! Verbosity Control in Packages”</a>.</p>
<ul>
<li>With the {cli} R package you can change the default handler for messages. See the <a href="https://cli.r-lib.org/articles/semantic-cli.html#cli-messages" rel="nofollow" target="_blank">docs</a>. It seems mostly used to muffle messages, e.g. in <a href="https://github.com/etiennebacher/flir/blob/9254cd01d258d0bafcee41a44e5caa7104fed832/R/lint.R#L104" rel="nofollow" target="_blank">flir</a>.</li>
<li>Here’s how the usethis R package <a href="https://github.com/r-lib/usethis/commit/f0f3f91494a1b15c1b08ee78dc73ab7d1cf8b6a8" rel="nofollow" target="_blank">muffles gert message selectively</a>.</li>
</ul>
<h2>
Last words
</h2><p>Thanks for reading! If you want to get involved with rOpenSci, check out our <a href="https://contributing.ropensci.org/" rel="nofollow" target="_blank">Contributing Guide</a> that can help direct you to the right place, whether you want to make code contributions, non-code contributions, or contribute in other ways like sharing use cases. You can also support our work through <a href="https://ropensci.org/donate" rel="nofollow" target="_blank">donations</a>.</p>
<p>If you haven’t subscribed to our newsletter yet, you can <a href="https://ropensci.org/news/" rel="nofollow" target="_blank">do so via a form</a>. Until it’s time for our next newsletter, you can keep in touch with us via our <a href="https://ropensci.org/" rel="nofollow" target="_blank">website</a> and <a href="https://hachyderm.io/@rOpenSci" rel="nofollow" target="_blank">Mastodon account</a>.</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://ropensci.org/blog/2026/03/30/news-mars-2026/"> rOpenSci - open tools for open science</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/ropensci-news-digest-march-2026/">rOpenSci News Digest, March 2026</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400176</post-id>	</item>
		<item>
		<title>Pacific island energy supply by @ellis2013nz</title>
		<link>https://www.r-bloggers.com/2026/03/pacific-island-energy-supply-by-ellis2013nz/</link>
		
		<dc:creator><![CDATA[free range statistics - R]]></dc:creator>
		<pubDate>Sun, 29 Mar 2026 13:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://freerangestats.info/blog/2026/03/30/pacific-energy</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> With the conflict in Iran causing worldwide disruption to energy markets, I have both a work and personal interest in energy supply in Pacific islands, which led me to this blog post. Here I look at just two aspects of energy: electricity generation, a...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/pacific-island-energy-supply-by-ellis2013nz/">Pacific island energy supply by @ellis2013nz</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://freerangestats.info/blog/2026/03/30/pacific-energy"> free range statistics - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>With the conflict in Iran causing worldwide disruption to energy markets, I have both a work and personal interest in energy supply in Pacific islands, which led me to this blog post. Here I look at just two aspects of energy: electricity generation, and household cooking. Nothing fancy here, just accessing some data and drawing a couple of plots.</p>

<h2 id="electricity-generation">Electricity generation</h2>

<p>Here is the <em>source</em> of electricity for Pacific island countries, plus Australia and New Zealand, collated by Our World In Data from Energy Institute data that ultimately comes from government estimates:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0320-pict-electricity-mix.svg" width="450"><img src="https://i0.wp.com/freerangestats.info/img/0320-pict-electricity-mix.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>There’s a pretty obvious story here: most of the Pacific is <strong>very</strong> dependent on “oil” (in the form of diesel) for generation of most of its electricity. There are some small steps towards renewables happening in recent years, but the vulnerability to a price or availability shock for diesel is pretty obvious.</p>

<p>Here’s the code for producing that, using the valuable <code>owidapi</code> R package to access the Our World in Data API.</p>

<figure class="highlight"><pre>#---------------Set up-----------------
library(owidapi)
library(tidyverse)
library(countrycode)
library(WDI)
library(jsonlite)
library(janitor)
library(httr2)


pic_codes &lt;- 
  c(
    &quot;ASM&quot;, &quot;COK&quot;, &quot;FSM&quot;, &quot;FJI&quot;, &quot;PYF&quot;, &quot;GUM&quot;, &quot;KIR&quot;, &quot;MHL&quot;, &quot;NRU&quot;, &quot;NCL&quot;,
    &quot;NIU&quot;, &quot;MNP&quot;, &quot;PLW&quot;, &quot;PNG&quot;, &quot;PCN&quot;, &quot;WSM&quot;, &quot;SLB&quot;, &quot;TKL&quot;, &quot;TON&quot;, &quot;TUV&quot;,
    &quot;VUT&quot;, &quot;WLF&quot;, &quot;AUS&quot;, &quot;NZL&quot;
  )
stopifnot(length(pic_codes) == 24)

# visual check we've got the right country codes for the Pacific:
countrycode::countrycode(pic_codes, origin = &quot;iso3c&quot;, destination = &quot;country.name.en&quot;)

#=======================electricity source===================

palette &lt;- c(
  coal = &quot;brown&quot;,
  gas = &quot;magenta&quot;,
  oil = &quot;red&quot;,
  kerosene = &quot;red&quot;,
  electricity = &quot;purple&quot;,
  solar = &quot;yellow&quot;,
  wind = &quot;steelblue&quot;,
  hydro = &quot;darkblue&quot;,
  bioenergy = &quot;lightgreen&quot;,
  charcoal = &quot;grey&quot;,
  biomass = &quot;darkgreen&quot;,
  'other renewables' = &quot;darkgreen&quot;
)

#-------------------electricity mix-----------------
elec_mix &lt;- owid_get(
  chart_id = &quot;share-elec-by-source&quot;,
  entities  = pic_codes
)

elec_data &lt;- elec_mix |&gt; 
   rename(country = entity_name) |&gt; 
   select(-entity_id) |&gt; 
   gather(variable, value, -country, -year) |&gt;
   filter(value != 0) |&gt; 
   filter(year &gt; 2001) |&gt; 
   mutate(variable = gsub(&quot;_share_of_electricity__pct&quot;, &quot;&quot;, variable, fixed = TRUE),
          variable =gsub(&quot;_&quot;, &quot; &quot;, variable),
          variable =gsub(&quot; excluding bioenergy&quot;, &quot;&quot;, variable),
          variable = fct_drop(variable)) |&gt; 
   mutate(variable = fct_relevel(variable, c(&quot;bioenergy&quot;, &quot;hydro&quot;, &quot;other renewables&quot;), after = Inf)) |&gt; 
   mutate(country = fct_relevel(country, c(&quot;Australia&quot;, &quot;New Zealand&quot;), after = Inf)) |&gt; 
  group_by(country) |&gt; 
  mutate(prop_pc= sum(value[variable %in% c(&quot;oil&quot;, &quot;gas&quot;) & year == max(year)]) 
         / sum(value[year == max(year)])) |&gt; 
  ungroup() |&gt; 
  mutate(country = fct_reorder(country, prop_pc))

# Draw chart
elec_data |&gt; 
  ggplot(aes(x = year, y = value, fill = variable)) +
 facet_wrap(~country, ncol = 5) +
  geom_col() +
  scale_fill_manual(values = palette) +
  scale_y_continuous(label = percent_format(scale = 1)) +
   labs(y = &quot;Percentage of electricity&quot;,
        fill = &quot;Source:&quot;,
        title = &quot;Share of electricity by source&quot;,
        subtitle = &quot;Countries shown in increasing order of vulnerability of electricity to a petrochemicals price or availability crisis.&quot;,
        x = &quot;&quot;,
        caption = &quot;Source: Ember (2026); Energy Institute - Statistical Review of World Energy (2025). Data processed by Our World In Data.&quot;) +
   theme(axis.text.x = element_text(angle = 45, hjust = 1))</pre></figure>

<h2 id="cooking-fuel">Cooking fuel</h2>

<p>OK, so electricity generation could be threatened by a lack of diesel. What about household cooking? This next chart draws on the definitive World Health Organization Household Energy Database, which models (based on what household survey data that is available) what households are using to cook:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0320-pict-cooking-fuel.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0320-pict-cooking-fuel.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>Again, we see a lot of reliance on petrochemical products, particularly kerosene and liquid natural gas. The latter has been promoted as a relatively clean and healthy fuel to cook with compared to burning biomass (e.g. wood, coconuts, etc).</p>

<p>The larger Melanesian countries, with high rural populations, are those with the greatest use still of biomass for cooking. Most Pacific island countries do most of their cooking with oil or gas derived energy (remembering from the first chart, that ‘electricity’ often means diesel, ultimately).</p>

<p>Here’s the code to produce that chart. I used an LLM (I forget which) for the code to access the API itself, but I tested it and tweaked it to match my style, and the chart of course is all my own code.</p>

<figure class="highlight"><pre>#--------------------cooking-------------------
# The definitive source is the WHO  WHO Household Energy Database 
# which draws on various household surveys
# See https://www.who.int/data/gho/data/themes/air-pollution/cooking-fuel-and-technology-database-by-fuel-category

# next half dozen lines of code were supplied by Co-pilot and minimally
# tweaked by me for my style
indicator_code &lt;- &quot;PHE_HHAIR_PROP_POP_CATEGORY_FUELS&quot;  # % by fuel type [3](https://millenniumindicators.un.org/wiki/spaces/SDGeHandbook/pages/35291272/Indicator+7.1.2)
url &lt;- paste0(&quot;https://ghoapi.azureedge.net/api/&quot;, indicator_code)

resp &lt;- request(url) |&gt; 
  req_headers(`Accept` = &quot;application/json&quot;)  |&gt; 
  req_perform()

cooking_data &lt;- fromJSON(resp_body_string(resp), flatten = TRUE)$value |&gt;
  as_tibble() |&gt; 
  clean_names()


pic_cooking_data &lt;-cooking_data |&gt; 
  filter(spatial_dim %in% pic_codes) |&gt; 
  filter(dim1 == &quot;RESIDENCEAREATYPE_TOTL&quot;) |&gt; 
  mutate(fuel_type = tolower(gsub(&quot;HOUSEHOLDCOOKINGFUEL_FUEL_&quot;, &quot;&quot;, dim2))) |&gt; 
  mutate(year = as.numeric(time_dimension_value)) |&gt; 
   select(value = numeric_value, 
          iso3_code = spatial_dim,
          value = numeric_value,
          year,
          fuel_type) |&gt; 
  mutate(country = countrycode(iso3_code, origin = &quot;iso3c&quot;, destination = &quot;country.name.en&quot;),
         country = gsub(&quot;Federated States&quot;, &quot;Fed St&quot;, country)) |&gt;
  group_by(country) |&gt; 
  mutate(prop_gke = sum(value[fuel_type %in% c(&quot;gas&quot;, &quot;kerosene&quot;, &quot;electricity&quot;) & year == max(year)]) 
         / sum(value[year == max(year)])) |&gt; 
  ungroup() |&gt; 
  mutate(country = fct_reorder(country, prop_gke))

# Draw chart
pic_cooking_data |&gt; 
  ggplot(aes(y = value, x = year, fill = fuel_type)) +
  facet_wrap(~country, ncol = 5) +
  # the numbers don't add up to 100 always, due to being modelled estimates
  #, not fully MECE, not counting dual fuels, etc. Good practice advice
  # is to not force them to add to 100%
  geom_area() +
  scale_fill_manual(values = palette) +
  scale_y_continuous(label = percent_format(scale = 1)) +
  labs(title = &quot;Household primary fuel used for cooking&quot;,
       subtitle = &quot;Estimates are modelled by WHO, and not adding up to 100% is a known limitation.
Countries shown in increasing order of vulnerability of cooking to a petrochemicals price or availability crisis.&quot;,
       x = &quot;&quot;,
       fill = &quot;Fuel type:&quot;,
       y =&quot;Proportion of households&quot;,
       caption = &quot;Source: WHO Household Energy Database&quot;)</pre></figure>

<p>That’s all, just a quick one today.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://freerangestats.info/blog/2026/03/30/pacific-energy"> free range statistics - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/pacific-island-energy-supply-by-ellis2013nz/">Pacific island energy supply by @ellis2013nz</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400172</post-id>	</item>
		<item>
		<title>Navigating Financial Statement And The Story It Tells Us &#8211; A Note To Myself</title>
		<link>https://www.r-bloggers.com/2026/03/navigating-financial-statement-and-the-story-it-tells-us-a-note-to-myself/</link>
		
		<dc:creator><![CDATA[r on Everyday Is A School Day]]></dc:creator>
		<pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.kenkoonwong.com/blog/financial-statement/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
📊 Dipping my toes into financial statements — income, balance sheet &#038; cash flow. Still don’t fully get it, but slowly piecing together the story these numbers tell. Warren Buffett makes it look easy 😅 Baby steps! 🌱</p>
<p>Motivations</p>
<p>      ...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/navigating-financial-statement-and-the-story-it-tells-us-a-note-to-myself/">Navigating Financial Statement And The Story It Tells Us – A Note To Myself</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.kenkoonwong.com/blog/financial-statement/"> r on Everyday Is A School Day</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<blockquote>
<p><img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f4ca.png" alt="📊" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Dipping my toes into financial statements — income, balance sheet &#038; cash flow. Still don’t fully get it, but slowly piecing together the story these numbers tell. Warren Buffett makes it look easy <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f605.png" alt="😅" class="wp-smiley" style="height: 1em; max-height: 1em;" /> Baby steps! <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f331.png" alt="🌱" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>
</blockquote>




<h2 id="motivations">Motivations
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#motivations" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>I’ve always wanted to learn financial statement, what it means, what it tells us, what Warren Buffett sees in them. Following the book 
<a href="https://www.amazon.com/Warren-Buffett-Interpretation-Financial-Statements/dp/1849833192" rel="nofollow" target="_blank">Warren Buffett and the Interpretation of Financial Statements: The Search for the Company with a Durable Competitive Advantage</a> and 
<a href="https://datacamp.pxf.io/15bPYD" rel="nofollow" target="_blank">Datacamp: Analyzing Financial Statement in Python</a>, I’ve made some notes for myself and also create the metrics functions, so that I can use and view them easily in the future. I’ll be honest, I still don’t fully understand it, but at least I can refer back to this as I look at these statements more frequently.</p>




<h4 id="disclaimer">Disclaimer:
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#disclaimer" rel="nofollow" target="_blank"></a>
</h4>
<p><em>This is purely for educational purposes. This is not a financial advice, nor am I a financial advisor. This is a note to myself. If you find any mistakes or error, please let me know. Thanks! Also, there are a lot of information on each section, I won’t be covering all of them, just mostly the metrics from the book and also points I found interesting.</em></p>




<h2 id="objectives">Objectives:
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#objectives" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#skeleton" rel="nofollow" target="_blank">The Skeleton of Financial Statements</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#example" rel="nofollow" target="_blank">Let’s Take An Example</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#income" rel="nofollow" target="_blank">Income Statement</a>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#gpm" rel="nofollow" target="_blank">Gross Profit Margin</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#depreciation" rel="nofollow" target="_blank">Depreciation</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#interest" rel="nofollow" target="_blank">Interest Payment to Operating Income</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#incomebeforetax" rel="nofollow" target="_blank">Income Before Tax</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#incomeaftertax" rel="nofollow" target="_blank">Income After Tax</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#netearnings" rel="nofollow" target="_blank">Net Earnings</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#share" rel="nofollow" target="_blank">Per Share Earning</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#operatingmargin" rel="nofollow" target="_blank">Operating Margin</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#metric1" rel="nofollow" target="_blank">Metrics</a></li>
</ul>
</li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#balance" rel="nofollow" target="_blank">Balance Sheet</a>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#asset" rel="nofollow" target="_blank">Current Assets</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#receivables" rel="nofollow" target="_blank">Net Receivables To Gross Sale Ratio</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#currentratio" rel="nofollow" target="_blank">The Current Ratio</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#ppe" rel="nofollow" target="_blank">Property, Plant, and Equipment</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#short" rel="nofollow" target="_blank">Short Term Debt</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#long" rel="nofollow" target="_blank">Long Term Debt</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#retained" rel="nofollow" target="_blank">Retained Earnings</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#treasury" rel="nofollow" target="_blank">Treasury Stock</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#ROSE" rel="nofollow" target="_blank">Return On Shareholders’ Equity</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#metric2" rel="nofollow" target="_blank">Metrics</a></li>
</ul>
</li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#cashflow" rel="nofollow" target="_blank">Cash Flow</a>
<ul>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#operating" rel="nofollow" target="_blank">Operating Income</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#capex" rel="nofollow" target="_blank">Capital Expenditure</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#buyback" rel="nofollow" target="_blank">Stock Buyback</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#metric3" rel="nofollow" target="_blank">Metrics</a></li>
</ul>
</li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#combine" rel="nofollow" target="_blank">Combine All Metrics</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#another" rel="nofollow" target="_blank">Let’s Look At Another Examples</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#opportunities" rel="nofollow" target="_blank">Oppotunities For Improvement</a></li>
<li>
<a href="https://www.kenkoonwong.com/blog/financial-statement/#lessons" rel="nofollow" target="_blank">Lessons Learnt</a></li>
</ul>




<h2 id="skeleton">The Skeleton of Financial Statements
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#skeleton" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>A financial statement is a formal record that shows a company’s financial activities and position, typically consisting of three core components: the <code>income statement</code> (which shows revenues earned and expenses incurred to calculate profit or loss over a period), the <code>balance sheet</code> (which presents what the company owns as assets, what it owes as liabilities, and the difference between them as equity at a specific point in time), and the <code>cash flow</code> statement (which tracks the actual movement of cash in and out of the business through operating activities, investing activities, and financing activities). The <code>income statement reveals profitability</code>, and the <code>cash flow statement shows liquidity</code> and how money actually moves through the business.</p>
<p>If we were to think of a kid’s lemonaid shop, the <code>income statement</code> would show how much money the shop made from selling lemon tea and how much it spent on ingredients, paying Johnny hourly to sell (salary) to calculate the profit. The <code>balance sheet</code> would list the shop’s assets (like cash in the register, inventory of lemons, sugar, and any equipment) and liabilities (like loans or unpaid bills &#8211; money your parent you borrowed from to buy all of the above) to show the net worth of the business at a given moment. The <code>cash flow statement</code> would track the actual cash coming in from customers and going out for expenses, giving insight into whether the shop has enough liquidity to cover its day-to-day operations.</p>
<p>It sounds simple, in the big picture, but these are just the basic skeleton of financial statements. There are many nuances and details that we need to understand to really grasp the story that these statements are telling us.Each section has its own items and some of these items are good at forming different metrics to tell the story of how the lemonaid business is doing. Below is just a snapshot of Apple’s financial statement.</p>




<h4 id="income-statement">Income Statement
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#income-statement" rel="nofollow" target="_blank"></a>
</h4>
<p align="center">
  <img loading="lazy" src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/income1.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p align="center">
  <img loading="lazy" src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/income2.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>




<h4 id="balance-sheet">Balance Sheet
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#balance-sheet" rel="nofollow" target="_blank"></a>
</h4>
<p align="center">
  <img loading="lazy" src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/balance1.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>
<p align="center">
  <img loading="lazy" src="https://i1.wp.com/www.kenkoonwong.com/blog/financial-statement/balance2.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>




<h4 id="cash-flow">Cash Flow
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#cash-flow" rel="nofollow" target="_blank"></a>
</h4>
<p align="center">
  <img loading="lazy" src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/cash.png?w=450&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>




<h2 id="example">Let’s Take An Example
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#example" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Let’s go to 
<a href="https://www.alphavantage.co/" rel="nofollow" target="_blank">Alpha Vantage</a> and create a free api key and then pull Apple’s 10 year financial statement and go through as an exercise.</p>
<p>
<a href="https://www.sec.gov/Archives/edgar/data/320193/000032019325000079/aapl-20250927.htm#i719388195b384d85a4e238ad88eba90a_181" rel="nofollow" target="_blank">https://www.sec.gov/Archives/edgar/data/320193/000032019325000079/aapl-20250927.htm#i719388195b384d85a4e238ad88eba90a_181</a></p>
<pre>library(httr)
library(jsonlite)
library(tidyverse)

api_key &lt;- &quot;your_api_key_here&quot; # Use .Renviron to be safer like below combine_all code

## Create a function to pull data
get_data &lt;- function(fx,ticker) {
  raw &lt;- GET(paste0(
    &quot;https://www.alphavantage.co/query?function=&quot;,fx,
    &quot;&#038;symbol=&quot;,ticker,&quot;&#038;apikey=&quot;, api_key
  )) %&gt;%
    content(as = &quot;text&quot;, encoding = &quot;UTF-8&quot;) %&gt;%
    fromJSON()
  
  df &lt;- raw$annualReports |&gt; 
    as_tibble() |&gt; 
    mutate(across(-c(fiscalDateEnding, reportedCurrency), as.numeric)) |&gt;
    mutate(fiscalDateEnding = as.Date(fiscalDateEnding)) |&gt;
    arrange(fiscalDateEnding)

  return(df)
}

## financial statement
income &lt;- get_data(&quot;INCOME_STATEMENT&quot;,&quot;AAPL&quot;)
balance &lt;- get_data(&quot;BALANCE_SHEET&quot;,&quot;AAPL&quot;)
cashflow &lt;- get_data(&quot;CASH_FLOW&quot;,&quot;AAPL&quot;)
</pre><p>Let’s visualize the income statement</p>
<img src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-2-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />




<h2 id="income">Income Statement
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#income" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>Alright, here we go! Income statement tells the story of how the goods and services are doing in the market, how much it costs to produce and sell them, and how much profit is left after all expenses are accounted for. It is a dynamic statement that shows the flow of money over a period of time, typically a quarter or a year. It is like a movie that tells the story of the company’s operations and profitability. Below are some ratios and heuristics of which companies have a durable competitive advantage, according to Warren Buffett.</p>




<h3 id="gpm">Gross Profit Margin
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#gpm" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Gross Profit Margin = Gross Profit / Total Revenue\)</code></p>
<p>In the book, we are looking for consistent, as a general rule, <code>above 40%</code> to consider a company to have a durable competitive advantage. Gross profit here is basically <code>Total Revenue - Cost of Revenue</code>, does not take into account of R&D, admin etc. Gross profit margin is a measure of how much profit a company makes from its core operations, before accounting for other expenses. A higher gross profit margin indicates that the company has a strong competitive position in the market and is able to generate more profit from its sales.</p>
<p>So on Apple, you can see that in the sales section, there is product and service. I assume product is the hardware and service is like the cloud storage etc. The cost of revenue section has the same sections, that depicts how much cost to make these products/services. Again, remember this is all just about the goods, does not take into account of the R&D, office that manages these, administrative etc, I think. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h3 id="depreciation">Depreciation
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#depreciation" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Apparently this is a non-cash expense that reflects the reduction in value of a company’s assets over time. It is an accounting method used to allocate the cost of tangible assets (like machinery, equipment, buildings) and intangible assets (like patents, copyrights) over their useful lives. Depreciation allows companies to spread out the expense of an asset over several years, rather than recognizing the entire cost in the year it was purchased.</p>
<p>A quick ratio in the book is</p>
<p><code>\(Depreciation / Gross Profit\)</code></p>
<p>which should be low <code>~5-7%</code> which indicates that the company is not heavily reliant on physical assets that may lose value over time. Warren stated that EBITDA (Earnings Before Depreciation, Taxes, and Amortization) is something Wall Street loves, but in Warren’s eyes, depreciation is a real expanse. Why is it that he said EBITDA is something Wall Street love? Because it shows a higher profit by excluding non-cash expenses like depreciation, which can make a company look more profitable than it actually is… interesting. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h3 id="interest">Interest Payment to Operating Income
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#interest" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Interest Payment / Operating Income\)</code></p>
<p>should be <code>less than 15%</code>. This ratio indicates that the company is not overly burdened by debt and has a healthy balance between its operating income and interest expenses. A lower ratio suggests that the company is generating sufficient operating income to cover its interest payments, which is a positive sign of financial stability.</p>
<p><code>Operating Income = Total Revenue - Cost of Revenue - Operating Expenses</code>. And <code>Operating Expenses = R&D + Selling, General and Administrative Expenses</code>. Wow, so many terminologies and I still don’t fully understand them, but at least I can refer back to this when I look at these statements.</p>




<h3 id="incomebeforetax">Income Before Tax
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#incomebeforetax" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>This is a value that Warren uses when he calculates the return that he is getting when he a whole business.</p>




<h3 id="incomeaftertax">Income After Tax
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#incomeaftertax" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>This is a value is the truth test of a business. What is reported with SEC should reflect the pre-tax income reported on the income statement. Apparently some companies like to report higher values than the the truth? was what the book said <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h3 id="netearnings">Net Earnings
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#netearnings" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Net Earning / Total Revenue\)</code></p>
<p>The heuristic for this is we should look for <code>more than 20%</code> which indicates that the company is able to generate a significant amount of profit from its total revenue, hence a long term competitive advantage.</p>
<p><code>Net Earnings = Operating Income - Interest Expense - Taxes</code>. This is the bottom line of the income statement, which represents the company’s total profit after all expenses have been deducted from total revenue. It is a key indicator of a company’s profitability and financial performance. A higher net earnings figure indicates that the company is generating more profit from its operations, which can be a sign of a strong competitive position in the market.Interest Expense - Taxes`.</p>




<h3 id="share">Per Share Earning
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#share" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>This is something Warren wants to see consistent increment over time.</p>
<p>This is calculated by:
<code>\(EPS = Net Income/Outstanding Share\)</code></p>
<p>Does that mean if net income is negative, we’ll have negative share? <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f914.png" alt="🤔" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h3 id="operatingmargin">Operating Margin
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#operatingmargin" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Operating Margin = Operating Income / Total Revenue\)</code></p>
<p>This is a measure of a company’s profitability that indicates how much profit it generates from its operations relative to its total revenue. A higher operating margin suggests that the company is more efficient at converting revenue into profit, which can be a sign of a strong competitive position in the market. Warren looks for <code>more than 10%</code> which indicates that the company has a durable competitive advantage. ?????</p>




<h4 id="metric1">Metrics
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#metric1" rel="nofollow" target="_blank"></a>
</h4>
<pre>income_metric &lt;- function(df) {
  df_i &lt;- df |&gt;
    mutate(grossProfitMargin = grossProfit / totalRevenue,
           depreciationToGrossProfit = depreciationAndAmortization / grossProfit,
           interestExpenseToOperatingIncome = interestExpense / operatingIncome,
           netEarningMargin = netIncome / totalRevenue,
           operatingMargin = operatingIncome / totalRevenue)
  
  hline &lt;- tribble(
    ~param, ~hline_value, ~color2,
    &quot;grossProfitMargin&quot;, 0.4, &quot;red&quot;, 
    &quot;depreciationToGrossProfit&quot;, 0.07, &quot;blue&quot;,
    &quot;interestExpenseToOperatingIncome&quot;, 0.15, &quot;blue&quot;,
    &quot;netEarningMargin&quot;, 0.2, &quot;red&quot;,
    &quot;operatingMargin&quot;, 0.1, &quot;red&quot;
  ) |&gt;
    mutate(
      param = factor(param),
      ymin = ifelse(color2 == &quot;red&quot;, -Inf, hline_value),
      ymax = ifelse(color2 == &quot;red&quot;, hline_value, Inf)
    )
  
  columns &lt;- hline$param
  
  plot &lt;- df_i |&gt;
    select(fiscalDateEnding, grossProfitMargin:operatingMargin) |&gt;
    pivot_longer(cols = c(grossProfitMargin:operatingMargin), 
                 names_to = &quot;param&quot;, values_to = &quot;values&quot;) |&gt;
    mutate(param = factor(param, levels = columns)) |&gt; 
    ggplot(aes(x = fiscalDateEnding, y = values)) +
    geom_rect(
      data = hline,
      aes(ymin = ymin, ymax = ymax, fill = color2),
      xmin = -Inf, xmax = Inf,
      alpha = 0.15,
      inherit.aes = FALSE
    ) +
    geom_line() +
    facet_wrap(. ~ param, scale = &quot;free_y&quot;) +
    scale_fill_identity() +
    theme_bw()
  
  return(plot)
}

income_metric(income) 
</pre><img src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-3-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Net margin</p>
<p>What is EBTDA? It stands for <code>Earnings Before Depreciation, Taxes, and Amortization</code>. It is a measure of a company’s operating performance that excludes non-operating expenses such as depreciation, taxes, and amortization. It is often used as a proxy for cash flow from operations, as it focuses on the core profitability of the business before accounting for non-cash expenses and tax obligations.</p>




<h2 id="balance">Balance Sheet
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#balance" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p><code>\(Asset = Liability + Shareholder Equity\)</code></p>
<p>Reminded me of Full Metal Alchemist famous phrase “tōka kōkan”, equivalent exchange.</p>
<p align="center">
<img loading="lazy" src="https://i2.wp.com/images5.alphacoders.com/840/840678.jpg?w=50%25&#038;ssl=1" alt="image" height="auto" data-recalc-dims="1">
</p>




<h3 id="asset">Current Assets
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#asset" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Current asset is any asset that can be reasonably expected to be converted into cash within one year. This includes cash and cash equivalents, accounts receivable, inventory, and other short-term assets. Current assets are important because they provide insight into a company’s liquidity and ability to meet its short-term obligations.</p>




<h3 id="receivables">Net Receivables To Gross Sale Ratio
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#receivables" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Net Receivables / Gross Sale\)</code></p>
<p>If a company consistently has a lower ratio (? <code>5%</code> or less), it may indicate that the company is efficient in collecting payments from customers and has a lower risk of bad debts. A higher ratio may suggest that the company is having difficulty collecting payments, which could lead to cash flow issues and potential losses from uncollected receivables.</p>




<h3 id="currentratio">The Current Ratio
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#currentratio" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Current Ratio = Total Assets / Total Liabilities\)</code></p>
<p>The current ratio is a liquidity ratio that measures a company’s ability to pay off its short-term liabilities with its short-term assets. A current ratio of <code>1 or higher</code> is generally considered good, indicating that the company has enough assets to cover its liabilities. A current ratio below <code>1</code> may indicate that the company may have difficulty meeting its short-term obligations. This makes sense.</p>




<h3 id="ppe">Property, Plant, and Equipment
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#ppe" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>This is the value of a company’s physical assets, such as land, buildings, machinery, and equipment. It is important to consider the value of PPE when evaluating a company’s financial health and potential for growth. A company with a significant amount of PPE may have a competitive advantage in its industry, as it may be able to produce goods or services more efficiently than its competitors. However, it is also important to consider the age and condition of the PPE, as well as any potential liabilities associated with it. In the title of this chapter, it says <code>For Warren Not Having Them Is A Good Thing</code> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f923.png" alt="🤣" class="wp-smiley" style="height: 1em; max-height: 1em;" /></p>




<h3 id="short">Short Term Debt
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#short" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>From Warren’s perspective, when it comes to investing in financial institutions, he’s always shied away from companies who are bigger borrowers in short-term than long-term.</p>
<p><code>\(Short Term Debt/ Long Term Debt\)</code></p>
<p>Not really sure what the heuristic threshold is, but let’s use <code>less than 1</code> as a good indicator?</p>




<h3 id="long">Long Term Debt
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#long" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>As a general rule, a company with durable competitve edge, will have no or little long term debt to maintain their business operation. Let’s ensure this is not an uptrend on visualization</p>




<h3 id="retained">Retained Earnings
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#retained" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Retained earnings is the portion of a company’s net income that is retained and not distributed as dividends to shareholders. It represents the accumulated profits that a company has reinvested in its business over time. Retained earnings can be used for various purposes, such as funding research and development, expanding operations, paying off debt, or acquiring other companies. It is an important metric for investors to consider when evaluating a company’s financial health and growth potential, as it indicates how much profit the company has generated and how it has been utilized to support its long-term success.</p>
<p><code>\(Growth Rate = (Ending Retained Earnings/Beginning Retained Earnings)^{1/years}-1\)</code></p>




<h3 id="treasury">Treasury Stock
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#treasury" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Treasury stock refers to shares that a company has repurchased from its shareholders. These shares are held in the company’s treasury and are not considered outstanding shares. Treasury stock can be used for various purposes, such as to increase shareholder value, to have shares available for employee compensation plans, or to prevent hostile takeovers. When a company repurchases its own shares, it reduces the number of outstanding shares in the market, which can increase the value of the remaining shares and potentially boost earnings per share (EPS). However, it is important for investors to consider the reasons behind a company’s decision to buy back its own stock and how it may impact the company’s financial health and long-term growth prospects.</p>




<h3 id="ROSE">Return On Shareholders’ Equity
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#ROSE" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p><code>\(Return On Shareholders' Equity = Net Income / Shareholders' Equity\)</code></p>
<p>According to the book, <code>high ROSE means come play</code> <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f923.png" alt="🤣" class="wp-smiley" style="height: 1em; max-height: 1em;" /> How about stop and smell the rose? <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f339.png" alt="🌹" class="wp-smiley" style="height: 1em; max-height: 1em;" /> The book did not tell us exactly what the threshold is, but the companies of choice has about <code>~30-35%</code></p>




<h4 id="metrics">Metrics
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#metrics" rel="nofollow" target="_blank"></a>
</h4>
<pre>balance_metric &lt;- function(df, income) {
  df_b &lt;- df |&gt;
    mutate(currentRatio = totalCurrentAssets / totalCurrentLiabilities,
           netReceivablesToGrossSale = currentNetReceivables / income$grossProfit,
           shortToLongTermDebt = shortTermDebt / longTermDebt,
           growthRateRetainedEarnings = (retainedEarnings / lag(retainedEarnings))^(1/n()) - 1,
           returnOnShareholdersEquity = income$netIncome / totalShareholderEquity)
  
  hline &lt;- tribble(
    ~param, ~hline_value, ~color2,
    &quot;currentRatio&quot;, 1, &quot;red&quot;, 
    &quot;netReceivablesToGrossSale&quot;, 0.05, &quot;blue&quot;,
    &quot;shortToLongTermDebt&quot;, 1, &quot;blue&quot;,
    &quot;growthRateRetainedEarnings&quot;, 0.05, &quot;red&quot;,
    &quot;returnOnShareholdersEquity&quot;, 0.3, &quot;red&quot;
  ) |&gt;
    mutate(
      param = factor(param),
      ymin = ifelse(color2 == &quot;red&quot;, -Inf, hline_value),
      ymax = ifelse(color2 == &quot;red&quot;, hline_value, Inf)
    )
  
  columns &lt;- hline$param
  
  plot &lt;- df_b |&gt;
    select(fiscalDateEnding, currentRatio:returnOnShareholdersEquity) |&gt;
    pivot_longer(cols = c(currentRatio:returnOnShareholdersEquity), 
                 names_to = &quot;param&quot;, values_to = &quot;values&quot;) |&gt;
    mutate(param = factor(param, levels = columns)) |&gt; 
    ggplot(aes(x = fiscalDateEnding, y = values)) +
    geom_rect(
      data = hline,
      aes(ymin = ymin, ymax = ymax, fill = color2),
      xmin = -Inf, xmax = Inf,
      alpha = 0.15,
      inherit.aes = FALSE
    ) +
    geom_line() +
    facet_wrap(. ~ param, scale = &quot;free_y&quot;) +
    scale_fill_identity() +
    theme_bw()
  
  return(plot)
}

balance_metric(balance, income)
</pre><img src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-4-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />




<h2 id="cashflow">Cash Flow
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#cashflow" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>




<h3 id="operating">Operating Income
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#operating" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Cash flow from operating income starts with net income and then add back depreciation and amortization. This is because depreciation and amortization are non-cash expenses that reduce net income but do not actually involve a cash outflow. By adding them back, we can get a better picture of the actual cash generated by the company’s operations.</p>




<h3 id="capex">Capital Expenditure
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#capex" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Also known as CapEx, refers to the funds that a company uses to acquire, upgrade, and maintain physical assets such as property, buildings, technology, equipment, or machinery. CapEx is an important metric for investors to consider when evaluating a company’s financial health and growth potential, as it indicates how much the company is investing in its long-term success.</p>
<p>If we were to look at Apple’s cash flow statement, CapEx is payments for acquisition of proptery, plant, and equipment.</p>
<p><code>\(Capital Expenditure/Net Earning\)</code></p>
<p>And the heuristic is <code>~50% or less</code>.</p>




<h3 id="buyback">Stock Buyback
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#buyback" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<p>Stock buyback, also known as share repurchase, refers to a company’s practice of buying back its own shares from the open market. This can be done for various reasons, such as to increase shareholder value, to have shares available for employee compensation plans, or to prevent hostile takeovers. When a company repurchases its own shares, it reduces the number of outstanding shares in the market, which can increase the value of the remaining shares and potentially boost earnings per share (EPS).</p>
<p>On the Apple cash flow statement, stock buyback is listed as repurchase of common stocks. Others might use issuance of (retirement) stocks.</p>




<h4 id="metrics-1">Metrics
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#metrics-1" rel="nofollow" target="_blank"></a>
</h4>
<pre>cashflow_metric &lt;- function(df) {
  df_c &lt;- df |&gt;
    mutate(operatingCashFlow = netIncome + depreciationDepletionAndAmortization,
           capitalExpenditureToNetEarning = capitalExpenditures / netIncome,
           stockBuybackToNetEarning = abs(proceedsFromRepurchaseOfEquity / netIncome))
  
  hline &lt;- tribble(
    ~param, ~hline_value, ~color2,
    &quot;operatingCashFlow&quot;, 0, &quot;red&quot;, 
    &quot;capitalExpenditureToNetEarning&quot;, 0.5, &quot;blue&quot;,
    &quot;stockBuybackToNetEarning&quot;, 0.5, &quot;red&quot;
  ) |&gt;
    mutate(
      param = factor(param),
      ymin = ifelse(color2 == &quot;red&quot;, -Inf, hline_value),
      ymax = ifelse(color2 == &quot;red&quot;, hline_value, Inf)
    )
  
  columns &lt;- hline$param
  
  plot &lt;- df_c |&gt;
    select(fiscalDateEnding, operatingCashFlow, capitalExpenditureToNetEarning, stockBuybackToNetEarning) |&gt;
    pivot_longer(cols = c(operatingCashFlow, capitalExpenditureToNetEarning, stockBuybackToNetEarning), 
                 names_to = &quot;param&quot;, values_to = &quot;values&quot;) |&gt;
    mutate(param = factor(param, levels = columns)) |&gt; 
    ggplot(aes(x = fiscalDateEnding, y = values)) +
    geom_rect(
      data = hline,
      aes(ymin = ymin, ymax = ymax, fill = color2),
      xmin = -Inf, xmax = Inf,
      alpha = 0.15,
      inherit.aes = FALSE
    ) +
    geom_line() +
    facet_wrap(. ~ param, scale = &quot;free_y&quot;) +
    scale_fill_identity() +
    theme_bw()
  
  return(plot)
}

cashflow_metric(cashflow)
</pre><img src="https://i1.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-5-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Use key valuation metrics (P/E, EV/EBITDA, P/B, P/S, etc) to determine how cheap or expensive a stock is.</p>




<h2 id="combine">Combine All Metrics
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#combine" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<details>
<summary>code</summary>
<pre>library(httr)
library(jsonlite)
library(tidyverse)
library(ggpubr)

api_key &lt;- Sys.getenv(&quot;avkey&quot;)

## Create a function to pull data
get_data &lt;- function(fx,ticker,share=F) {
  raw &lt;- GET(paste0(
    &quot;https://www.alphavantage.co/query?function=&quot;,fx,
    &quot;&#038;symbol=&quot;,ticker,&quot;&#038;apikey=&quot;, api_key
  )) %&gt;%
    content(as = &quot;text&quot;, encoding = &quot;UTF-8&quot;) %&gt;%
    fromJSON()
  
  if (share==T) {
    df &lt;- raw$annualEarnings |&gt;
      select(fiscalDateEnding, reportedEPS) |&gt;
      mutate(fiscalDateEnding = ymd(fiscalDateEnding))
  } else {
  df &lt;- raw$annualReports |&gt; 
    as_tibble() |&gt;
    mutate(fiscalDateEnding = as.Date(fiscalDateEnding)) |&gt;   
    mutate(across(where(is.character), as.numeric)) |&gt;        
    arrange(fiscalDateEnding)
  }

  return(df)
}

income_metric &lt;- function(df) {
  df_i &lt;- df |&gt;
    mutate(grossProfitMargin = grossProfit / totalRevenue,
           depreciationToGrossProfit = depreciationAndAmortization / grossProfit,
           interestExpenseToOperatingIncome = interestExpense / operatingIncome,
           netEarningMargin = netIncome / totalRevenue,
           operatingMargin = operatingIncome / totalRevenue)
  
  hline &lt;- tribble(
    ~param, ~hline_value, ~color2,
    &quot;grossProfitMargin&quot;, 0.4, &quot;red&quot;, 
    &quot;depreciationToGrossProfit&quot;, 0.07, &quot;blue&quot;,
    &quot;interestExpenseToOperatingIncome&quot;, 0.15, &quot;blue&quot;,
    &quot;netEarningMargin&quot;, 0.2, &quot;red&quot;,
    &quot;operatingMargin&quot;, 0.1, &quot;red&quot;
  ) |&gt;
    mutate(
      param = factor(param),
      ymin = ifelse(color2 == &quot;red&quot;, -Inf, hline_value),
      ymax = ifelse(color2 == &quot;red&quot;, hline_value, Inf)
    )
  
  columns &lt;- hline$param
  
  plot &lt;- df_i |&gt;
    select(fiscalDateEnding, grossProfitMargin:operatingMargin) |&gt;
    pivot_longer(cols = c(grossProfitMargin:operatingMargin), 
                 names_to = &quot;param&quot;, values_to = &quot;values&quot;) |&gt;
    mutate(param = factor(param, levels = columns)) |&gt; 
    ggplot(aes(x = fiscalDateEnding, y = values)) +
    geom_rect(
      data = hline,
      aes(ymin = ymin, ymax = ymax, fill = color2),
      xmin = -Inf, xmax = Inf,
      alpha = 0.15,
      inherit.aes = FALSE
    ) +
    geom_line() +
    facet_wrap(. ~ param, scale = &quot;free_y&quot;) +
    scale_fill_identity() +
    theme_bw() +
    ggtitle(&quot;Income Statement Metrics&quot;)
  
  return(plot)
}

share_metric &lt;- function(df) {
    plot &lt;- df |&gt;
      mutate(reportedEPS = as.numeric(reportedEPS)) |&gt;
      ggplot(aes(x = fiscalDateEnding, y = reportedEPS)) +
      geom_line() +
      geom_smooth() +
      theme_bw() +
      ggtitle(&quot;Share Metrics&quot;)
    return(plot)
}

balance_metric &lt;- function(df, income) {
  df_b &lt;- df |&gt;
    mutate(currentRatio = totalCurrentAssets / totalCurrentLiabilities,
           netReceivablesToGrossSale = currentNetReceivables / income$grossProfit,
           shortToLongTermDebt = shortTermDebt / longTermDebt,
           growthRateRetainedEarnings = (retainedEarnings / lag(retainedEarnings))^(1/n()) - 1,
           returnOnShareholdersEquity = income$netIncome / totalShareholderEquity)
  
  hline &lt;- tribble(
    ~param, ~hline_value, ~color2,
    &quot;currentRatio&quot;, 1, &quot;red&quot;, 
    &quot;netReceivablesToGrossSale&quot;, 0.3, &quot;blue&quot;,
    &quot;shortToLongTermDebt&quot;, 1, &quot;blue&quot;,
    &quot;growthRateRetainedEarnings&quot;, 0.05, &quot;red&quot;,
    &quot;returnOnShareholdersEquity&quot;, 0.3, &quot;red&quot;
  ) |&gt;
    mutate(
      param = factor(param),
      ymin = ifelse(color2 == &quot;red&quot;, -Inf, hline_value),
      ymax = ifelse(color2 == &quot;red&quot;, hline_value, Inf)
    )
  
  columns &lt;- hline$param
  
  plot &lt;- df_b |&gt;
    select(fiscalDateEnding, currentRatio:returnOnShareholdersEquity) |&gt;
    pivot_longer(cols = c(currentRatio:returnOnShareholdersEquity), 
                 names_to = &quot;param&quot;, values_to = &quot;values&quot;) |&gt;
    mutate(param = factor(param, levels = columns)) |&gt;
    ggplot(aes(x = fiscalDateEnding, y = values)) +
    geom_rect(
      data = hline,
      aes(ymin = ymin, ymax = ymax, fill = color2),
      xmin = -Inf, xmax = Inf,
      alpha = 0.15,
      inherit.aes = FALSE
    ) +
    geom_line() +
    facet_wrap(. ~ param, scale = &quot;free_y&quot;) +
    scale_fill_identity() +
    theme_bw() +
    ggtitle(&quot;Balance Sheet Metrics&quot;)
  
  return(plot)
}

cashflow_metric &lt;- function(df, income) {
  df_c &lt;- df |&gt;
    mutate(operatingCashFlow = netIncome + depreciationDepletionAndAmortization,
           capitalExpenditureToNetEarning = capitalExpenditures / netIncome,
           stockBuybackToNetEarning = case_when(
             proceedsFromRepurchaseOfEquity &lt; 0 ~ -proceedsFromRepurchaseOfEquity / income$netIncome,
             income$netIncome &lt; 0 &#038; proceedsFromRepurchaseOfEquity &gt; 0 ~ NA_real_,
             income$netIncome &lt; 0 &#038; proceedsFromRepurchaseOfEquity &lt; 0 ~-proceedsFromRepurchaseOfEquity / income$netIncome))
  
  hline &lt;- tribble(
    ~param, ~hline_value, ~color2,
    &quot;operatingCashFlow&quot;, 0, &quot;red&quot;, 
    &quot;capitalExpenditureToNetEarning&quot;, 0.5, &quot;blue&quot;,
    &quot;stockBuybackToNetEarning&quot;, 0.5, &quot;red&quot;
  ) |&gt;
    mutate(
      param = factor(param),
      ymin = ifelse(color2 == &quot;red&quot;, -Inf, hline_value),
      ymax = ifelse(color2 == &quot;red&quot;, hline_value, Inf)
    )
  
  columns &lt;- hline$param
  
  plot &lt;- df_c |&gt;
    select(fiscalDateEnding, operatingCashFlow, capitalExpenditureToNetEarning, stockBuybackToNetEarning) |&gt;
    pivot_longer(cols = c(operatingCashFlow, capitalExpenditureToNetEarning, stockBuybackToNetEarning), 
                 names_to = &quot;param&quot;, values_to = &quot;values&quot;) |&gt;
    mutate(param = factor(param, levels = columns)) |&gt; 
    ggplot(aes(x = fiscalDateEnding, y = values)) +
    geom_rect(
      data = hline,
      aes(ymin = ymin, ymax = ymax, fill = color2),
      xmin = -Inf, xmax = Inf,
      alpha = 0.15,
      inherit.aes = FALSE
    ) +
    geom_line() +
    facet_wrap(. ~ param, scale = &quot;free_y&quot;) +
    scale_fill_identity() +
    theme_bw() +
    ggtitle(&quot;Cash Flow Metrics&quot;)
  
  return(plot)
}

show_all &lt;- function(name) {
  income &lt;- get_data(&quot;INCOME_STATEMENT&quot;,name)
  share &lt;- get_data(&quot;EARNINGS&quot;,name,share=T)
  balance &lt;- get_data(&quot;BALANCE_SHEET&quot;,name)
  cashflow &lt;- get_data(&quot;CASH_FLOW&quot;,name)
  plot1 &lt;- income_metric(income) 
  plot2 &lt;- share_metric(share)
  plot3 &lt;- balance_metric(balance, income)
  plot4 &lt;- cashflow_metric(cashflow, income)
  stacked_plot &lt;- ggarrange(plot1,plot3,nrow=2)
  squished_plot &lt;- ggarrange(plot4,plot2,ncol=2, widths = c(2,1))
  combineplot &lt;- ggarrange(plotlist = list(stacked_plot,squished_plot), nrow = 2, heights = c(2,1))
  return(combineplot)
}
</pre></details>




<h2 id="another">Let’s Look At Another Example
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#another" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<p>As a heuristics, we’ve made our dataviz where <code>red is like lava</code>, you want to stay above it. <code>Blue is like the sky</code>, you want to stay below it. The sweet zone is in between. <img src="https://s.w.org/images/core/emoji/13.0.0/72x72/1f923.png" alt="🤣" class="wp-smiley" style="height: 1em; max-height: 1em;" /> When there is a loess curve, we are trying to see if the share is consistently increasing.</p>




<h4 id="apple">Apple
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#apple" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;AAPL&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-7-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Looking the income metrics, Apple seems to be doing pretty good. Gross profit margin, net earning margin, operating income margin are all above thresholds. Depreciation to gross profit ratio is appropriately below the threshold, which means there is no recent purchase of property, plants, machines, etc where these are the ones that will add to depreciation. There is also a low to no interest expense, which is good! No debt? Next we move on to balance sheet metrics, current ratio is not great, liquidity is not great, net receivable is ?OK, I guess it make sense, if your products are popular and provide some sort of finance option, you might have some receivables. Short term debt is much lower than long term debt, which is good. The ROSE is smelling pretty good there too! In terms of cash flow metrics, it’s looking really good, it has high cash flowing in, low CapEx, and consistently buying its own share from 2020 onwards.</p>




<h4 id="microsoft">Microsoft
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#microsoft" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;MSFT&quot;)
</pre><img src="https://i2.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-8-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>As for Microsoft, now that we’re a bit more familiar with the sections, let’s just string our read instead of separating them. Good consistent profit for all profit, operating and net income. Interestingly, depreciation is high along with CapEx, did they buy property, plants or machine? 
<a href="https://www.ciodive.com/news/microsoft-azure-capacity-constraints-datacenter-buildouts-cloud-ai/722912/" rel="nofollow" target="_blank">Ah, they built data center?</a> Interest expense is low, which is good. Good liquidity factor with current ratio above threshold, net receivables is OK same as Apple (a competitor), short/long debt is good as well. ROSE is close to the threshold (worth watching). Cash is flowing, not much stock buyback, but overall EPS is consistently increasing. With its investment in data centers for Azure, next few years we should be seeing a persistent depreciation to gross profit ratio, along with CapEx, right? I read somewhere where you can’t place all depreciation in a single year.</p>




<h4 id="nvidia">NVIDIA
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#nvidia" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;NVDA&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-9-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Now let’s take a look at NVIDIA. Wow, gross profit, operating income, and net earning margin all really good, better than Apple and Microsoft! Well, make sense with all these data centers, they need GPUs from NVIDIA. Look at that depreciation and CapEx, barely any! That’s great, less to maintain and the current plants they have are adequate to supply the demand. Low interest expense too, not much debt interest. Liquidity is great with high current ratio! Net receivables is good too, these big companies are paying NVIDIA back! ROSE is also very fragrant! Cash is flowing through the roof. Not much stock buyback. Also exponential EPS! You know, Warren did say that be mindful of companies with R&D cost.</p>




<h4 id="intel">Intel
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#intel" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;INTC&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-10-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Wow, first glance with the profits, we’re seeing lava red. We then see high rise in depreciation and CapEx. Did they buy more property, plants, or machines? 
<a href="https://newsroom.intel.com/press-kit/intel-invests-ohio" rel="nofollow" target="_blank">Intel announces plans for an investment of more than $28 billion for two new chip factories in Licking County, Ohio</a>. And maybe Germany and Arizona too? Interesting. Notice that the interest expense towards the end of 2025 was in 2 digit but negative? It’s because the the operating income was negative and a high interest expense. I wonder if I should make this absolute as opposed to keeping the negative. Anyway, this means that not only was Intel in the red but also paying quite a bit of interest from the debt, I assume. The current ratio looks pretty good, same goes with the net receivables. Short term debt is also much lower compared to long term. ROSE not so good. As for the cash flow, when we add depreciation and demortization back to net income, it brought the red back up. Notice how the EPS dropped significantly lately, and notice that the buyback of share is missing? We code it where if it’s a positive value of buyback, which means it’s issuing stock instead of repurchasing, hence NA. Wow, what do you think? Will they be able to turn this around? Do data centers use Intel chips or AMD’s threadripper?</p>




<h4 id="amd">AMD
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#amd" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;AMD&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-11-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Interesting income statement metric results. Good gross profit margin, but the net earning is barely. Interesting depreciation trend, what happened in 2024, where CapEx didn’t budge? Great liquidity, pretty high net receivables. Wow, very high short-to-long term debt in ?2021-2022. ROSE is essentially 0. With cash flowing good towards 2025, and also high buybacks 2022 ish. Finally, uptrending EPS! This is an odd one. Pasting my observation onto Claude and wow, these findings are due to Xilinx acquisition. That makes sense! Didn’t build new plants, old property, plants, machines have already been amortized, hence low depreciation. The acquisition is funded by debt, hence high short-to-long term debt ratio. Very interesting, indeed! So, it could be true that data centers prefer AMD over Intel given the good profit?</p>




<h4 id="google">Google
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#google" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;GOOG&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-12-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Very good profit of all 3 metrics. After 2020, depreciation downtrended which is good and so did CapEx. Not much debt. Downtrending current ratio but still good. Very good net receivables! ROSE is coming up. Great cash flow. There is buyback of equity. Rising EPS. All great signs for Google! Being an all-service (no hardware?) company, this is quite good and healthy!</p>




<h4 id="3m">3M
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#3m" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;MMM&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-13-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Now on to 3M, not a tech company, but let’s see if our workflow of looking at financial statement will help us tell a story. Profit is not as great as other tech companies we looked at, though gross profit is above threshold, but the net earnings is in the lava. Notice both net earnings and operating margins dipped quite low in 2024? And then a pretty high interest expense to operating income ratio in 2025? I wonder if because of losing money, they borrowed long term, hence short-to-long term debt is not too high? Current ratio is acceptable, but why is net receivables so high for 3M? Clients were not able to pay 3M? ROSE is pretty good in 2024 and 2025 even when profits weren’t. But why? Cash flow with high variance last 3 years. There is stock buyback in 2025. And even though EPS downtrended past 2 years but still remained high. This is a very interesting one as well. 3M should be a company where there’s competitive advantage because they make lots of daily usables which doesn’t need a whole lot of R&D. But why the anomaly in 2024 requiring debt? What do you think?</p>




<h4 id="disney">Disney
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#disney" rel="nofollow" target="_blank"></a>
</h4>
<pre>show_all(&quot;DIS&quot;)
</pre><img src="https://i0.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-14-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Alright, what about Disney? Profit doesn’t seem as good as I expected. Both gross profit and net earning margins were below threshold, except for operating income margin. Interesting, in 2021, there is increase in depreciation, CapEx, and interest expense with low short-to-long debt ratio. It almost looked like they borrowed some money to purchase something new? Did they rebuild a park or something? In terms of liquidity, it’s on the lava zone in 2025 and seemed like it downtrended for the past 3 years as well. Net receivables is OK? Though I would think it should be lower? ROSE is red. Cash flow looks good, this is interesting because whatever that caused the depreciation when added back now they have cash. Also interesting to note that from 2020 to 2024 they were issuing stock rather than buying back. Their EPS appear to be quite volatile for the pat 3 years.</p>




<h3 id="boeing">Boeing
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#boeing" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h3>
<pre>show_all(&quot;BA&quot;)
</pre><img src="https://i1.wp.com/www.kenkoonwong.com/blog/financial-statement/index_files/figure-html/unnamed-chunk-15-1.png?w=450&#038;ssl=1" data-recalc-dims="1" />
<p>Last but not least, let’s look at Boeing. Three profit margin metrics in the lava zone. Spike of interest expense in 2021 along with depreciation. Something physical was bought with borrowed money that’s paid long term. Current ratio is good. Essentially no net receivables. Consistent floating ROSE on the red sea. Operating cash flow in the red for a few years then positive in 2025. CapEx spiked in 2025, ?what was spent in capital where it doesn’t really depreciate? Buyback stock is interesting, missing several past few years, essentially issuing stocks when we have NA data. EPS downtrend to the negatives.</p>




<h2 id="opportunities">Opportunities For Improvement
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#opportunities" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<ul>
<li>We need to recheck growth rate again, not sure if the code is correct</li>
<li>perhaps build this into an R package, and develop further</li>
<li>Learn about valuation metrics such as P/E, EV/EBITDA, P/B, P/S</li>
<li>Not really sure if short-to-long term debt ratio of 1 is a good threshold, seems too high.</li>
<li>include actual numbers on geom_label with ggrepel and reduce font size</li>
</ul>




<h2 id="lessons">Lessons learnt
  <a href="https://www.kenkoonwong.com/blog/financial-statement/#lessons" rel="nofollow" target="_blank"><svg class="anchor-symbol" aria-hidden="true" height="26" width="26" viewBox="0 0 22 22" xmlns="http://www.w3.org/2000/svg">
      <path d="M0 0h24v24H0z" fill="currentColor"></path>
      <path d="M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z"></path>
    </svg></a>
</h2>
<ul>
<li>Depreciation and CapEx seem to have strong correlation, which makes sense.</li>
<li>learnt Alpha Vantage API, quite straight forward</li>
<li>learnt to look at financial statement and its metrics</li>
</ul>
<p>If you like this article:</p>
<ul>
<li>please feel free to send me a 
<a href="https://www.kenkoonwong.com/blog/" rel="nofollow" target="_blank">comment or visit my other blogs</a></li>
<li>please feel free to follow me on 
<a href="https://bsky.app/profile/kenkoonwong.bsky.social" rel="nofollow" target="_blank">BlueSky</a>, 
<a href="https://twitter.com/kenkoonwong/" rel="nofollow" target="_blank">twitter</a>, 
<a href="https://github.com/kenkoonwong/" rel="nofollow" target="_blank">GitHub</a> or 
<a href="https://rstats.me/@kenkoonwong" rel="nofollow" target="_blank">Mastodon</a></li>
<li>if you would like collaborate please feel free to 
<a href="https://www.kenkoonwong.com/contact/" rel="nofollow" target="_blank">contact me</a></li>
</ul>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.kenkoonwong.com/blog/financial-statement/"> r on Everyday Is A School Day</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/navigating-financial-statement-and-the-story-it-tells-us-a-note-to-myself/">Navigating Financial Statement And The Story It Tells Us – A Note To Myself</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400170</post-id>	</item>
		<item>
		<title>Explaining Time-Series Forecasts with Sensitivity Analysis (ahead::dynrmf and external regressors)</title>
		<link>https://www.r-bloggers.com/2026/03/explaining-time-series-forecasts-with-sensitivity-analysis-aheaddynrmf-and-external-regressors/</link>
		
		<dc:creator><![CDATA[T. Moudiki]]></dc:creator>
		<pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://thierrymoudiki.github.io//blog/2026/03/29/r/sensi-dynrmf</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> Explaining Time-Series Forecasts with Sensitivity Analysis (ahead::dynrmf and external regressors)</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/explaining-time-series-forecasts-with-sensitivity-analysis-aheaddynrmf-and-external-regressors/">Explaining Time-Series Forecasts with Sensitivity Analysis (ahead::dynrmf and external regressors)</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://thierrymoudiki.github.io//blog/2026/03/29/r/sensi-dynrmf"> T. Moudiki's Webpage - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>Following <a href="https://thierrymoudiki.github.io/blog/2026/03/08/r/exact-shapley-dynrmf" rel="nofollow" target="_blank">the post on exact Shapley values</a> for time series explainability, this post illustrates an example of how to use sensitivity analysis  to explain time-series forecasts, based on the <code>ahead::dynrmf</code> model and external regressors. What is <strong>sensitivity analysis</strong> in this context? It’s about evaluating the impact of changes in the external regressors on the time-series forecast.</p>

<p>The post uses the <a href="https://docs.techtonique.net/ahead/reference/dynrmf_sensi.html" rel="nofollow" target="_blank"><code>ahead::dynrmf_sensi</code></a> function to compute the sensitivities, and the <a href="https://docs.techtonique.net/ahead/reference/plot_dynrmf_sensitivity.html" rel="nofollow" target="_blank"><code>ahead::plot_dynrmf_sensitivity</code></a> function to plot the results.</p>

<p>First, install the package:</p>

<pre>devtools::install_github(&quot;Techtonique/ahead&quot;)
</pre>

<p>Then, run the following code:</p>

<pre># devtools::install_github(&quot;Techtonique/ahead&quot;)
# install.packages(c(&quot;fpp2&quot;, &quot;e1071&quot;, &quot;patchwork&quot;))

library(ahead)
library(fpp2)
library(patchwork)
library(e1071)

#' # Example 1: US Consumption vs Income
sensitivity_results_auto &lt;- ahead::dynrmf_sensi(
y = fpp2::uschange[, &quot;Consumption&quot;],
xreg = fpp2::uschange[, &quot;Income&quot;],
h = 10
)

plot1 &lt;- ahead::plot_dynrmf_sensitivity(sensitivity_results_auto, 
                           title = &quot;Sensitivity of Consumption to Income (Ridge)&quot;,
                           y_label = &quot;Effect (ΔConsumption / ΔIncome)&quot;)

#' # Example 1: US Consumption vs Income
sensitivity_results_auto_svm &lt;- ahead::dynrmf_sensi(
  y = fpp2::uschange[, &quot;Consumption&quot;],
  xreg = fpp2::uschange[, &quot;Income&quot;],
  h = 10, 
  fit_func = e1071::svm # additional parameter passed to ahead::dynrmf
)

plot2 &lt;- ahead::plot_dynrmf_sensitivity(sensitivity_results_auto_svm, 
                                        title = &quot;Sensitivity of Consumption to Income (SVM)&quot;,
                                        y_label = &quot;Effect (ΔConsumption / ΔIncome)&quot;)

 
# Example 2: TV Advertising vs Insurance Quotes
sensitivity_results_tv &lt;- ahead::dynrmf_sensi(
 y = fpp2::insurance[, &quot;Quotes&quot;],
   xreg = fpp2::insurance[, &quot;TV.advert&quot;],
   h = 8
 )

plot3 &lt;- ahead::plot_dynrmf_sensitivity(sensitivity_results_tv,
                           title = &quot;Sensitivity of Insurance Quotes to TV Advertising (Ridge)&quot;,
                           y_label = &quot;Effect (ΔQuotes / ΔTV.advert)&quot;)

sensitivity_results_tv_svm &lt;- ahead::dynrmf_sensi(
  y = fpp2::insurance[, &quot;Quotes&quot;],
  xreg = fpp2::insurance[, &quot;TV.advert&quot;],
  h = 8, 
  fit_func = e1071::svm # additional parameter passed to ahead::dynrmf
)

plot4 &lt;- ahead::plot_dynrmf_sensitivity(sensitivity_results_tv_svm,
                                        title = &quot;Sensitivity of Insurance Quotes to TV Advertising (SVM)&quot;,
                                        y_label = &quot;Effect (ΔQuotes / ΔTV.advert)&quot;)

(plot1+plot2)

(plot3+plot4)
</pre>

<p><img src="https://i0.wp.com/thierrymoudiki.github.io/images/2026-03-29/2026-03-29-image1.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" />
<img src="https://i1.wp.com/thierrymoudiki.github.io/images/2026-03-29/2026-03-29-image2.png?w=578&#038;ssl=1" alt="image-title-here" class="img-responsive" data-recalc-dims="1" /></p>


<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://thierrymoudiki.github.io//blog/2026/03/29/r/sensi-dynrmf"> T. Moudiki's Webpage - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/explaining-time-series-forecasts-with-sensitivity-analysis-aheaddynrmf-and-external-regressors/">Explaining Time-Series Forecasts with Sensitivity Analysis (ahead::dynrmf and external regressors)</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400155</post-id>	</item>
		<item>
		<title>Momentum Investing Enhanced by Microsoft Foundry-Hosted Large Language Model</title>
		<link>https://www.r-bloggers.com/2026/03/momentum-investing-enhanced-by-microsoft-foundry-hosted-large-language-model/</link>
		
		<dc:creator><![CDATA[Selcuk Disci]]></dc:creator>
		<pubDate>Sat, 28 Mar 2026 12:48:37 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://datageeek.com/?p=11857</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> LLM-enhanced momentum investing combines traditional momentum signals with real-time news interpretation by large language models (LLMs). The idea is straightforward: stocks with strong past returns are candidates for momentum portfolios, but their inclusion and weight are refined by LLM-generated sentiment scores derived from firm-specific news. This hybrid approach improves risk-adjusted ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/momentum-investing-enhanced-by-microsoft-foundry-hosted-large-language-model/">Momentum Investing Enhanced by Microsoft Foundry-Hosted Large Language Model</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://datageeek.com/2026/03/28/momentum-investing-enhanced-by-microsoft-foundry-hosted-large-language-model/"> DataGeeek</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p class="wp-block-paragraph"><strong><a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5680782" rel="nofollow" target="_blank"><em>LLM-enhanced momentum investing</em></a></strong> combines traditional momentum signals with real-time news interpretation by large language models (LLMs). The idea is straightforward: stocks with strong past returns are candidates for momentum portfolios, but their inclusion and weight are refined by LLM-generated sentiment scores derived from firm-specific news. This hybrid approach improves <strong>risk-adjusted returns</strong> (Sharpe, Sortino) and is particularly effective in concentrated, high-conviction portfolios.</p>



<p class="wp-block-paragraph"><strong>Key Parameters</strong></p>



<p class="wp-block-paragraph">1. <strong>Lookback Window (k)</strong></p>



<ul class="wp-block-list">
<li><strong>Definition:</strong> The number of past days of news considered for sentiment analysis.</li>



<li><strong>Role:</strong> Determines how much recent information the LLM uses to judge momentum continuation.</li>



<li><strong>Example:</strong> If k = 5, the model analyzes the last 5 business days of headlines and summaries for each stock. </li>
</ul>



<p class="wp-block-paragraph">2. <strong>Forecast Horizon (l)</strong></p>



<ul class="wp-block-list">
<li><strong>Definition:</strong> The period over which momentum continuation is predicted.</li>



<li><strong>Role:</strong> Sets the “target” for the LLM’s forecast — how far into the future the model should judge momentum persistence.</li>



<li><strong>Example:</strong> If <code>l = </code>5, the LLM predicts whether momentum will continue for the next 5 trading days.</li>



<li><strong>Connection to Rebalancing:</strong> The forecast horizon typically aligns with the rebalancing cycle. For weekly rebalancing, the horizon is 5 days; for monthly rebalancing, it’s ~21 days.</li>
</ul>



<p class="wp-block-paragraph">3. <strong>Portfolio Size (m)</strong></p>



<ul class="wp-block-list">
<li><strong>Definition:</strong> The number of stocks selected after LLM scoring.</li>



<li><strong>Role:</strong> Controls how concentrated or diversified the portfolio is.</li>



<li><strong>Example:</strong> From the top 20 YTD performers, you might select the top 10 after sentiment scoring.</li>
</ul>



<p class="wp-block-paragraph">4. <strong>Rebalancing Frequency (T)</strong></p>



<ul class="wp-block-list">
<li><strong>Definition:</strong> How often the portfolio is updated with new signals.</li>



<li><strong>Role:</strong> Sets the rhythm of portfolio refresh — weekly, monthly, or quarterly.</li>



<li><strong>Example:</strong> Weekly rebalancing means recalculating momentum and sentiment scores every 5 trading days.</li>
</ul>



<p class="wp-block-paragraph"><strong>Concept</strong></p>



<p class="wp-block-paragraph">The strategy begins with a <strong>classic momentum screen</strong>: select the <strong>top 20 S&#038;P 500 companies by year-to-date (YTD) performance</strong>. Instead of stopping there, the approach integrates <strong>large language model (LLM) sentiment analysis</strong> of firm-specific news. By analyzing the <strong>last 5 business days of headlines and summaries</strong>, the LLM produces a score indicating whether momentum is likely to continue.</p>



<p class="wp-block-paragraph">These scores are then used to <strong>re-weight the portfolio</strong>, tilting allocations toward companies with stronger news sentiment. Finally, the portfolio is narrowed to the <strong>top 10 conviction stocks</strong>.</p>



<p class="wp-block-paragraph"><strong>Selected Parameters</strong></p>



<ul class="wp-block-list">
<li><strong>Lookback Window:</strong> 5 days of firm-specific news.</li>



<li><strong>Rebalancing Frequency:</strong> Weekly updates of the portfolio.</li>



<li><strong>Forecast Horizon:</strong> 5 trading days (aligned with the rebalancing cycle).</li>
</ul>



<p class="wp-block-paragraph">This setup ensures that the LLM is asked to judge whether momentum will persist until the next rebalance, making the signals both <strong>short-term and actionable</strong>.</p>



<p class="wp-block-paragraph"><strong>What the Code Does Step by Step</strong></p>



<ol class="wp-block-list">
<li><strong>Fetching the Data</strong>
<ul class="wp-block-list">
<li>The <strong>R </strong>script first pulls all S&#038;P 500 tickers.</li>



<li>It calculates <strong>YTD returns</strong> for each stock.</li>



<li>The <strong>top 20 stocks</strong> by performance are selected as momentum candidates.</li>
</ul>
</li>



<li><strong>News Sentiment Analysis with LLM</strong>
<ul class="wp-block-list">
<li>For each of these 20 stocks, the code queries Bing News for recent headlines with <a href="https://learn.microsoft.com/en-us/azure/architecture/ai-ml/guide/data-science-and-machine-learning#ai-services-in-azure" rel="nofollow" target="_blank"><em><strong>Azure AI Services</strong></em></a>.</li>



<li>The last <strong>5 business days of news</strong> are collected.</li>



<li>These headlines and summaries are sent to a <a href="https://learn.microsoft.com/en-us/azure/foundry/what-is-foundry" rel="nofollow" target="_blank"><strong><em>Microsoft Foundry</em></strong></a>-hosted LLM.</li>



<li>The LLM outputs a <strong>score (0–1)</strong> indicating whether sentiment supports momentum continuation or signals reversal.</li>
</ul>
</li>



<li><strong>Portfolio Tilting</strong>
<ul class="wp-block-list">
<li>LLM scores are normalized to [-1, +1].</li>



<li>Baseline equal weights are tilted according to these scores.</li>



<li>The <strong>top 10 stocks</strong> by adjusted weight form the final portfolio</li>
</ul>
</li>
</ol>



<p class="wp-block-paragraph">4. <strong>Visualization</strong></p>



<ul class="wp-block-list">
<li>A styled table is created using the <a href="https://gt.rstudio.com/" rel="nofollow" target="_blank"><em><strong><code>gt</code> package</strong></em></a>.</li>



<li>Adjusted weights are color-coded (red–green gradient).</li>



<li>The final portfolio is saved as an image (<code>top10.png</code>).</li>
</ul>



<p class="wp-block-paragraph"><strong>Strategic Insight</strong></p>



<p class="wp-block-paragraph">This approach mirrors the methodology in the <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5680782" rel="nofollow" target="_blank"><em><strong>Swiss Finance Institute paper</strong></em></a>:</p>



<ul class="wp-block-list">
<li><strong>Momentum ranking</strong> provides the baseline.</li>



<li><strong>LLM sentiment scoring</strong> refines stock selection and weighting.</li>



<li><strong>Portfolio tilting</strong> integrates qualitative news signals into quantitative allocation.</li>
</ul>


<pre>
library(httr)
library(jsonlite)
library(tidyquant)
library(tidyverse)
library(lubridate)
library(gt)
library(gtExtras)
library(scales)
library(showtext)
library(webshot2)

# 1. Environment & Auth Setup
sysfonts::font_add_google(&quot;Roboto Slab&quot;, &quot;roboto_slab&quot;)
showtext_auto()

# Azure & Bing Credentials
bing_key           &lt;- &quot;&lt;your-bing-key&gt;&quot;
bing_endpoint      &lt;- &quot;&lt;your-bing-endpoint&gt;&quot; 
azure_llm_key      &lt;- &quot;&lt;your-llm-key&gt;&quot;
azure_llm_endpoint &lt;- &quot;&lt;your-llm-endpoint&gt;&quot;

# R Part (first): S&P 500 Screening with Monthly Returns
sp500_tickers &lt;- 
  tq_index(&quot;SP500&quot;) %&gt;% 
  select(symbol, company)

# Calculate YTD change
momentum_df &lt;- 
  sp500_tickers %&gt;%
  tq_get(get = &quot;stock.prices&quot;, from = floor_date(today(), &quot;year&quot;)) %&gt;%
  group_by(symbol) %&gt;%
  arrange(date) %&gt;% # Ensure chronological order for first/last functions
  summarize(
    total_return = (last(adjusted) / first(adjusted)) - 1, 
    .groups = &quot;drop&quot;
  ) %&gt;%
  inner_join(sp500_tickers, by = &quot;symbol&quot;) %&gt;%
  slice_max(total_return, n = 20) %&gt;%
  select(symbol, company)

# R Part (second): News Search and LLM Analysis
analyze_momentum_continuation &lt;- function(ticker, company_name) {
  
  # Construct the Bing News Search query: ticker + &#039; stock&#039;
  query_str &lt;- paste0(ticker, &quot; stock&quot;)
  news_url &lt;- paste0(bing_endpoint, &quot;v7.0/news/search&quot;)
  
  # Call Bing News Search API
  news_res &lt;- GET(news_url, add_headers(`Ocp-Apim-Subscription-Key` = bing_key), 
                  query = list(q = query_str, count = 5, freshness = &quot;Day&quot;))
  
  # Short pause to throttle calls and avoid rate limit errors
  Sys.sleep(1)
  
  news_text &lt;- &quot;&quot;
  if (status_code(news_res) == 200) {
    content &lt;- fromJSON(content(news_res, &quot;text&quot;, encoding = &quot;UTF-8&quot;))
    if (length(content$value) &gt; 0) {
      news_text &lt;- paste(content$value$name, content$value$description, collapse = &quot; | &quot;)
    }
  }
  
  # Construct the LLM payload for Azure AI Foundry
  prompt_payload &lt;- list(
    messages = list(
      list(role = &quot;system&quot;, content = &quot;You are an LLM Enhanced Momentum Investing Agent.&quot;),
      list(role = &quot;user&quot;, content = paste0(
        &quot;Headlines + Summaries for &quot;, company_name, &quot; (&quot;, ticker, &quot;): &quot;, news_text,
        &quot;\nPerform sentiment analysis based on the last 5 days of news (lookback=5, horizon=5). &quot;,
        &quot;Infer whether sentiment supports momentum continuation or signals reversal. &quot;,
        &quot;Return a JSON object with two fields: &#039;subsector&#039; (string) and &#039;llm_score&#039; (string: probability 0-1).&quot;))
    ),
    temperature = 0.1
  )
  
  llm_res &lt;- POST(url = azure_llm_endpoint, 
                  add_headers(`api-key` = azure_llm_key, `Content-Type` = &quot;application/json&quot;),
                  body = prompt_payload, encode = &quot;json&quot;)
  
  if (status_code(llm_res) == 200) {
    llm_out &lt;- fromJSON(content(llm_res, &quot;text&quot;, encoding = &quot;UTF-8&quot;))
    # Parse JSON response without using regex
    llm_json_data &lt;- fromJSON(llm_out$choices$message$content)
    return(as.data.frame(llm_json_data))
  } else {
    return(data.frame(subsector = &quot;N/A&quot;, llm_score = &quot;0.5&quot;))
  }
}

# Execute Analysis: Merge top 20 tickers with LLM scores
news_scores_df &lt;- momentum_df %&gt;%
  mutate(analysis = map2(symbol, company, analyze_momentum_continuation)) %&gt;%
  unnest(analysis) %&gt;%
  mutate(llm_score = as.numeric(llm_score))

# R Part (final): Portfolio Tilting and Visualization
# Normalize scores to [-1, +1] and tilt weights
tilted_portfolio &lt;- news_scores_df %&gt;%
  mutate(
    norm_score = rescale(llm_score, to = c(-1, 1), from = c(0, 1)),
    base_weight = 1 / n(),
    adj_weight = base_weight * (1 + norm_score)
  ) %&gt;%
  mutate(adj_weight = adj_weight / sum(adj_weight)) %&gt;%
  slice_max(adj_weight, n = 10)

# Create gt visualization using original column names
final_table &lt;- tilted_portfolio %&gt;%
  select(company, subsector, adj_weight) %&gt;%
  gt() %&gt;%
  tab_header(title = &quot;Top 10 Tilted S&P 500 Momentum Stocks&quot;) %&gt;%
  # Use cols_label for human-readable labels without renaming underlying columns
  cols_label(
    company = &quot;Company&quot;, 
    subsector = &quot;Subsector&quot;, 
    adj_weight = &quot;Adjusted Weight &quot;
  ) %&gt;%
  # Apply color intensity with scales::col_numeric
  data_color(
    columns = adj_weight, 
    colors = col_numeric(palette = c(&quot;red&quot;, &quot;green&quot;), domain = NULL)
  ) %&gt;%
  fmt_percent(
    columns = contains(&quot;adj_weight&quot;), 
    decimals = 2,
    locale = &quot;en&quot; 
  ) %&gt;% 
  cols_align(align = &quot;center&quot;) %&gt;%
  opt_table_font(font = google_font(&quot;Roboto Slab&quot;))

# Save the visualization as top10.png using webshot
gtsave(final_table, &quot;top10.png&quot;)
</pre>


<figure data-wp-context="{"imageId":"69c7da09160ee"}" data-wp-interactive="core/image" data-wp-key="69c7da09160ee" class="wp-block-image size-large wp-lightbox-container"><img loading="lazy" data-attachment-id="11859" data-permalink="https://datageeek.com/2026/03/28/momentum-investing-enhanced-by-microsoft-foundry-hosted-large-language-model/top10/" data-orig-file="https://datageeek.com/wp-content/uploads/2026/03/top10.png" data-orig-size="1500,1530" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="top10" data-image-description="" data-image-caption="" data-medium-file="https://datageeek.com/wp-content/uploads/2026/03/top10.png?w=294" data-large-file="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/03/top10.png?w=450&#038;ssl=1" data-wp-class--hide="state.isContentHidden" data-wp-class--show="state.isContentVisible" data-wp-init="callbacks.setButtonStyles" data-wp-on--click="actions.showLightbox" data-wp-on--load="callbacks.setButtonStyles" data-wp-on--pointerdown="actions.preloadImage" data-wp-on--pointerenter="actions.preloadImageWithDelay" data-wp-on--pointerleave="actions.cancelPreload" data-wp-on-window--resize="callbacks.setButtonStyles" src="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/03/top10.png?w=450&#038;ssl=1" alt="" class="wp-image-11859" srcset_temp="https://i1.wp.com/datageeek.com/wp-content/uploads/2026/03/top10.png?w=450&#038;ssl=1 1004w, https://datageeek.com/wp-content/uploads/2026/03/top10.png?w=147 147w, https://datageeek.com/wp-content/uploads/2026/03/top10.png?w=294 294w, https://datageeek.com/wp-content/uploads/2026/03/top10.png?w=768 768w, https://datageeek.com/wp-content/uploads/2026/03/top10.png?w=1440 1440w, https://datageeek.com/wp-content/uploads/2026/03/top10.png 1500w" sizes="(max-width: 1004px) 100vw, 1004px" data-recalc-dims="1" /><button
			class="lightbox-trigger"
			type="button"
			aria-haspopup="dialog"
			data-wp-bind--aria-label="state.thisImage.triggerButtonAriaLabel"
			data-wp-init="callbacks.initTriggerButton"
			data-wp-on--click="actions.showLightbox"
			data-wp-style--right="state.thisImage.buttonRight"
			data-wp-style--top="state.thisImage.buttonTop"
		>
			<svg xmlns="http://www.w3.org/2000/svg" width="12" height="12" fill="none" viewBox="0 0 12 12">
				<path fill="#fff" d="M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z" />
			</svg>
		</button></figure>



<p class="wp-block-paragraph"><strong>Final Observation</strong></p>



<p class="wp-block-paragraph">Looking at the resulting portfolio, one striking feature is the <strong>dominance of energy and petroleum companies</strong>. Firms such as <strong>ConocoPhillips, EOG Resources, ExxonMobil, Occidental Petroleum, Marathon Petroleum, Valero, Phillips 66, Chevron, and Baker Hughes</strong> all appear prominently.</p>



<p class="wp-block-paragraph">This heavy tilt toward energy is not random—it reflects how <strong>geopolitical tensions (Iran and U.S.–Israel war dynamics)</strong> have amplified the importance of oil and gas in global markets. News sentiment around these companies has been strongly supportive of continued momentum, pushing them into the top allocation slots.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://datageeek.com/2026/03/28/momentum-investing-enhanced-by-microsoft-foundry-hosted-large-language-model/"> DataGeeek</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/momentum-investing-enhanced-by-microsoft-foundry-hosted-large-language-model/">Momentum Investing Enhanced by Microsoft Foundry-Hosted Large Language Model</a>]]></content:encoded>
					
		
		<enclosure url="https://datageeek.com/wp-content/uploads/2026/03/image.png" length="0" type="" />
<enclosure url="https://1.gravatar.com/avatar/db5e3f9ef188ea98fe38ab05c5a3fad9fb52fe3472715a8fc02f7ea41731f77c?s=96&#038;d=identicon&#038;r=G" length="0" type="" />
<enclosure url="https://datageeek.com/wp-content/uploads/2026/03/top10.png?w=1004" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400135</post-id>	</item>
		<item>
		<title>Why Learning R is a Good Career Move in 2026</title>
		<link>https://www.r-bloggers.com/2026/03/why-learning-r-is-a-good-career-move-in-2026/</link>
		
		<dc:creator><![CDATA[The Jumping Rivers Blog]]></dc:creator>
		<pubDate>Thu, 26 Mar 2026 23:59:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Over the course of my career as a Data Scientist, I’ve worked on projects ranging from simple code reviews, to large application builds. For the most part, I have used R to do this.<br />
If you’re getting into coding or data science, one q...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/why-learning-r-is-a-good-career-move-in-2026/">Why Learning R is a Good Career Move in 2026</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/"> The Jumping Rivers Blog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p>
<a href = "https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/">
<img src="https://i1.wp.com/www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/featured.png?w=400&#038;ssl=1" style="width:400px" class="image-center" style="display: block; margin: auto;" data-recalc-dims="1" />
</a>
</p>
<p>Over the course of my career as a Data Scientist, I’ve worked on projects ranging from simple code reviews, to large application builds. For the most part, I have used R to do this.</p>
<p>If you’re getting into coding or data science, one question you’re probably asking yourself is <em>“Which language should I learn?”</em></p>
<p>This blog aims to show you why R might be a good decision.</p>
<aside class="advert">
<p>
Join us for our AI in Production conference! For more details, check out our
<a href="https://ai-in-production.jumpingrivers.com/" rel="nofollow" target="_blank">conference website!</a>
</p>
</aside>
<hr>
<h2 id="r-was-built-for-data-not-just-programming">R was built for data (not just programming)</h2>
<p>Unlike general purpose languages (such as Python), R was designed specifically for statistics and data analysis.</p>
<p>That means:</p>
<ul>
<li>Built in statistical tools</li>
<li>Powerful visualisation capabilities</li>
<li>Research level methods available immediately</li>
</ul>
<p>With packages like the <strong>tidyverse</strong>, you can clean, analyse, and visualise data with surprisingly little code.</p>
<hr>
<h2 id="high-demand-in-analytics-research-and-healthcare">High demand in analytics, research, and healthcare</h2>
<p>R is especially popular in many sectors such as:</p>
<ul>
<li>Healthcare &#038; biostats</li>
<li>Academic research</li>
<li>Government departments</li>
<li>Finance &#038; risk modeling</li>
<li>Pharmaceutical companies</li>
</ul>
<p>Here are some examples of R in production use:</p>
<ul>
<li>The <a href="https://github.com/bbc/bbplot" rel="nofollow" target="_blank">{bbplot} R package</a>. Yes, the BBC use R to create graphics for their website!</li>
<li>Health and wellbeing profiling <a href="https://shiny.posit.co/r/gallery/government-public-sector/scotpho-profiles/" rel="nofollow" target="_blank">app</a> for the NHS</li>
<li>During the Covid-19 pandemic, the financial times had a <a href="https://www.ft.com/content/a2901ce8-5eb7-4633-b89c-cbdf5b386938" rel="nofollow" target="_blank">stats tracker</a> in which the graphs were built with R.</li>
</ul>
<p>Knowing some R will give you a competitive edge if you’re looking at working within these sectors.</p>
<hr>
<h2 id="open-source-with-the-backing-of-posit">Open source with the backing of Posit</h2>
<p>R is open source. This means that:</p>
<ul>
<li>It’s free, and always will be!</li>
<li>Anyone can view the source code the makes up R, there are.</li>
<li>Each R package (a folder containing code) has to live on <a href="https://github.com/" rel="nofollow" target="_blank">GitHub.com</a>, for everyone to see.</li>
<li>It has a large community of contributors. There are great forums to get help such as <a href="https://stackoverflow.com/questions/tagged/r?tab=Votes" rel="nofollow" target="_blank">Stack Overflow</a>, <a href="https://forum.posit.co/" rel="nofollow" target="_blank">Posit Community</a> and the <a href="https://rweekly.org/" rel="nofollow" target="_blank">R weekly newsletter</a> and tonnes more.</li>
<li>There are thousands more available functionalities compared to paid softwares such as SPSS, SAS or Excel.</li>
</ul>
<p><a href="https://posit.co/" rel="nofollow" target="_blank">Posit</a>, who maintain the free to use RStudio and Positron IDEs (integrated development environment), have many full time staff working solely on maintaining and creating new functionality within R. This means we get:</p>
<ul>
<li>Defined accountability</li>
<li>Predictable release cycles</li>
<li>Bugs can be solved quicker</li>
</ul>
<hr>
<h2 id="incredible-data-visualisation-possibilities">Incredible data visualisation possibilities</h2>
<p>Being able to communicate your findings with stakeholders is very important in data science, and one of R’s biggest strengths is visualisation and reporting.</p>
<p>With the <strong>{ggplot2}</strong> package, you can create publication ready charts with very little code. The <a href="https://r-graph-gallery.com/best-r-chart-examples.html" rel="nofollow" target="_blank">R Graph Gallery</a> has some amazing examples of what is possible with {ggplot2}.</p>
<p>With the <strong>{quarto}</strong> and <strong>{shiny}</strong> packages, you are able to build reproducible reports and interactive dashboards. All this without needing to know any HTML, CSS or JavaScript.</p>
<h2 id="beginner-friendly-learning-curve">Beginner friendly learning curve</h2>
<p>This is very much my own opinion. Compared to other languages, I think R is
fairly intuitive and feels rewarding much earlier on in the journey. It also has (in my opinion), the most beginner friendly programme to code in, called RStudio.</p>
<p>Most people attend only two days worth of training with Jumping Rivers, and say they feel ready to start tackling their own data problems.</p>
<hr>
<h2 id="so-is-r-worth-learning-in-2026">So… is R worth learning in 2026?</h2>
<p>I think so. If you want pure software engineering or large-scale production systems, you may need Python. But for becoming a <strong>strong data thinker</strong>, and giving you an edge in your analysis, R is one of the best starting points.</p>
<p>
For updates and revisions to this article, see the <a href = "https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/">original post</a>
</p>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.jumpingrivers.com/blog/why-learning-r-is-a-good-career-move-in-2026/"> The Jumping Rivers Blog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/why-learning-r-is-a-good-career-move-in-2026/">Why Learning R is a Good Career Move in 2026</a>]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">400093</post-id>	</item>
		<item>
		<title>ECMLE on CRAN</title>
		<link>https://www.r-bloggers.com/2026/03/ecmle-on-cran/</link>
		
		<dc:creator><![CDATA[xi'an]]></dc:creator>
		<pubDate>Thu, 26 Mar 2026 23:26:05 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://xianblog.wordpress.com/?p=62503</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> x
</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/ecmle-on-cran/">ECMLE on CRAN</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://xianblog.wordpress.com/2026/03/27/ecmle-on-cran/"> R – Xi&#039;an&#039;s Og</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><a href="https://www.r-pkg.org/pkg/ECMLE" rel="nofollow" target="_blank"><img loading="lazy" data-attachment-id="62504" data-permalink="https://xianblog.wordpress.com/2026/03/27/ecmle-on-cran/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle-pdf/" data-orig-file="https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png" data-orig-size="1017,1153" data-comments-opened="1" data-image-meta="{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0","alt":""}" data-image-title="Screenshot 2026-03-26 at 20-20-50 ECMLE Approximating Evidence via Bounded Harmonic Means &#8211; ECMLE.pdf" data-image-description="" data-image-caption="" data-medium-file="https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?w=265" data-large-file="https://i2.wp.com/xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?resize=450%2C510&#038;ssl=1" class="aligncenter wp-image-62504 size-large" title="As a companion to our recent paper on Approximating evidence via bounded harmonic means, an R package named ECMLE for elliptical covering marginal likelihood estimator has been accepted by CRAN and is now available. The coding and deposit were made by my PhD student, Dana Naderi, main author of the paper." src="https://i2.wp.com/xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?resize=450%2C510&#038;ssl=1" alt="" width="450" height="510" srcset_temp="https://i2.wp.com/xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?resize=450%2C510&#038;ssl=1 450w, https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?w=900 900w, https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?w=85 85w, https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?w=265 265w, https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?w=768 768w" sizes="(max-width: 450px) 100vw, 450px" data-recalc-dims="1" /></a>x</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://xianblog.wordpress.com/2026/03/27/ecmle-on-cran/"> R – Xi&#039;an&#039;s Og</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/ecmle-on-cran/">ECMLE on CRAN</a>]]></content:encoded>
					
		
		<enclosure url="https://0.gravatar.com/avatar/3bddf040412784bc8ff54f0b6353b2c283c3eb7e11daccf2b3bfa95b469e4029?s=96&#038;d=https://s0.wp.com/i/mu.gif&#038;r=G" length="0" type="" />
<enclosure url="https://xianblog.wordpress.com/wp-content/uploads/2026/03/screenshot-2026-03-26-at-20-20-50-ecmle-approximating-evidence-via-bounded-harmonic-means-ecmle.pdf.png?w=450" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400115</post-id>	</item>
		<item>
		<title>February 2026 Top 40 New CRAN Packages</title>
		<link>https://www.r-bloggers.com/2026/03/february-2026-top-40-new-cran-packages/</link>
		
		<dc:creator><![CDATA[Joseph Rickert]]></dc:creator>
		<pubDate>Thu, 26 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://rworks.dev/posts/Feb-2026-Top40/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>Two hundred and fifty-five of the new packages submitted to CRAN in February were still there in mid-March. Here are my Top 40 picks in seventeen categories: Artificial Intelligence, Biology, Buddhism, Climate Science, Computational Methods, Dat...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/february-2026-top-40-new-cran-packages/">February 2026 Top 40 New CRAN Packages</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://rworks.dev/posts/Feb-2026-Top40/"> R Works</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<p>Two hundred and fifty-five of the new packages submitted to CRAN in February were still there in mid-March. Here are my Top 40 picks in seventeen categories: Artificial Intelligence, Biology, Buddhism, Climate Science, Computational Methods, Data, Ecology, Epidemiology, Genomics, Machine Learning, Medical Applications, Physics, Statistics, Surveys, Time Series, Utilities, and Visualization.</p>
<div class="columns">
<div class="column" style="width:45%;">
<section id="artificial-intelligence" class="level3">
<h3 class="anchored" data-anchor-id="artificial-intelligence">Artificial Intelligence</h3>
<p><a href="https://cran.r-project.org/package=quallmer" rel="nofollow" target="_blank">quallmer</a> v0.3.0: Provides tools for AI-assisted qualitative data coding using large language models (‘LLMs’) via the <code>ellmer</code> package, supporting providers including <em>OpenAI</em>, <em>Anthropic</em>, <em>Google</em>, <em>Azure</em>, and local models via <em>Ollama</em> including built-in <em>codebooks</em> for common applications such as sentiment analysis and policy coding. Functions enable creating custom notebooks, support systematic replication across models and settings, compute inter-coder reliability statistics and validation metrics, and provide audit trails for documenting coding workflows following <a href="https://www.amazon.com/Naturalistic-Inquiry-Yvonna-S-Lincoln/dp/0803924313" rel="nofollow" target="_blank">Lincoln and Guba’s (1985)</a> framework for establishing trustworthiness in qualitative research. See the <a href="https://cran.r-project.org/web/packages/quallmer/vignettes/getting-started.html" rel="nofollow" target="_blank">vignette</a> to get started.</p>
</section>
<section id="biology" class="level3">
<h3 class="anchored" data-anchor-id="biology">Biology</h3>
<p><a href="https://cran.r-project.org/package=BioGSP" rel="nofollow" target="_blank">BioGSP</a> v1.0.0: Implementation of Graph Signal Processing methods including Spectral Graph Wavelet Transform for analyzing spatial patterns in biological data and provides tools for multi-scale analysis of biology spatial signals, including forward and inverse transforms, energy analysis, and visualization functions tailored for biological applications. See <a href="https://www.sciencedirect.com/science/article/pii/S1063520310000552?via%3Dihub" rel="nofollow" target="_blank">Hammond, Vandergheynst, and Gribonval (2011)</a> and <a href="https://www.biorxiv.org/content/10.1101/2024.12.20.629650v1" rel="nofollow" target="_blank">Yao et al. (2024)</a> for biological application examples, and the <a href="https://cran.r-project.org/web/packages/BioGSP/vignettes/sgwt_simulation_demo.html" rel="nofollow" target="_blank">vignette</a> to get started.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/BioGSP.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-1" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/BioGSP.png?w=578&#038;ssl=1" class="img-fluid" alt="Plots of Fourier Modes" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=RuHere" rel="nofollow" target="_blank">RuHere</a> v1.0.1: Automatically flags common spatial errors in biological data using metadata, a six stage workflow, and functions that specifically integrate specialist-curated range information to identify geographic errors and introductions that often escape standard automated validation procedures. For details on the methodology see <a href="https://www.biorxiv.org/content/10.64898/2026.02.02.703373v1" rel="nofollow" target="_blank">Trindade &#038; Caron (2026)</a>. There are five vignettes including <a href="https://cran.r-project.org/web/packages/RuHere/vignettes/sampling_bias.html" rel="nofollow" target="_blank">Reducing sampling bias</a> and <a href="https://cran.r-project.org/web/packages/RuHere/vignettes/spatial_consistency.html" rel="nofollow" target="_blank">Ensuring spatial consistency</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/RuHere.jpeg?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-2" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/RuHere.jpeg?w=578&#038;ssl=1" class="img-fluid" alt="Snapshot of interactive map" data-recalc-dims="1"></a></p>
</section>
<section id="buddhism" class="level3">
<h3 class="anchored" data-anchor-id="buddhism">Buddhism</h3>
<p><a href="https://cran.r-project.org/package=tipitaka.critical" rel="nofollow" target="_blank">tipitaka.critical</a> v1.0.0: A lemmatized critical edition of the complete Pali Canon (Tipitaka), the canonical scripture of Theravadin Buddhism. Based on a five-witness collation of the Pali Text Society edition via GRETIL, SuttaCentral, the Vipassana Research Institute Chattha Sangayana edition, the Buddha Jayanti Tipitaka, and the Thai Royal Edition. All text is lemmatized using the Digital Pali Dictionary, grouping inflected forms by dictionary headword. Covers all three pitakas, Sutta, Vinaya, Abhidhamma, with 5,777 individual text units. For background on the collation method, see <a href="https://github.com/dangerzig/tipitaka.critical" rel="nofollow" target="_blank">Zigmond (2026)</a> and the <a href="https://cran.r-project.org/web/packages/tipitaka.critical/vignettes/tipitaka-critical.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/tipitaka.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-3" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/tipitaka.png?w=578&#038;ssl=1" class="img-fluid" alt="PCA plot of all Tipitaka texts" data-recalc-dims="1"></a></p>
</section>
<section id="climate-science" class="level3">
<h3 class="anchored" data-anchor-id="climate-science">Climate Science</h3>
<p><a href="https://cran.r-project.org/package=tidyextreme" rel="nofollow" target="_blank">tidyextreme</a> v1.00: Provides functions to calculate <a href="https://www.wcrp-climate.org/etccdi" rel="nofollow" target="_blank">Expert Team on Climate Change Detection and Indices</a> from daily or hourly temperature and precipitation data along with functions for flexible data handling. See the <a href="https://cran.r-project.org/web/packages/tidyextreme/vignettes/tidyextreme-tutorial.html" rel="nofollow" target="_blank">vignette</a>.</p>
</section>
<section id="computational-methods" class="level3">
<h3 class="anchored" data-anchor-id="computational-methods">Computational Methods</h3>
<p><a href="https://cran.r-project.org/package=compositional.mle" rel="nofollow" target="_blank">compositional.mle</a> v2.0.0: Provides composable optimization strategies for maximum likelihood estimation. Solvers are first-class functions that combine via sequential chaining, parallel racing, and random restarts. Implements gradient ascent, Newton-Raphson, quasi-Newton (BFGS), and derivative-free methods with support for constrained optimization and tracing. Returns <code>mle</code> objects compatible with <code>algebraic.mle</code> for downstream analysis. Methods are based on <a href="https://link.springer.com/book/10.1007/978-0-387-40065-5" rel="nofollow" target="_blank">Nocedal J, Wright SJ (2006)</a>. There are five vignettes including <a href="https://cran.r-project.org/web/packages/compositional.mle/vignettes/getting-started.html" rel="nofollow" target="_blank">Getting Started</a> and <a href="https://cran.r-project.org/web/packages/compositional.mle/vignettes/theory-and-intuition.html" rel="nofollow" target="_blank">Theory and Intuition Behind Numerical MLE</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/compmle.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-4" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/compmle.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of log-likelihood surface" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=nabla" rel="nofollow" target="_blank">nabla</a> v0.7.1: Enables exact automatic differentiation for <code>R</code> functions and provides a composable derivative operator D that computes gradients, Hessians, Jacobians, and arbitrary-order derivative tensors at machine precision. D(D(f)) gives Hessians, D(D(D(f))) gives third-order tensors for skewness of maximum likelihood estimators, and so on to any order. Works through any R code including loops, branches, and control flow. There are five vignettes including an <a href="https://cran.r-project.org/web/packages/nabla/vignettes/introduction.html" rel="nofollow" target="_blank">Introduction</a> and <a href="https://cran.r-project.org/web/packages/nabla/vignettes/mle-workflow.html" rel="nofollow" target="_blank">Gradient andHessian Computation</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/nabla.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-5" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/nabla.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of log-likelihood and gradient" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=rLifting" rel="nofollow" target="_blank">rLifting</a> v0.9.0: Performs Wavelet Lifting Transforms focusing on signal denoising and functional data analysis (FDA). Implements a hybrid architecture with a zero-allocation <code>C++</code> core for high-performance processing. Features include unified offline (batch) denoising, causal, real-time filtering using a ring buffer engine, and adaptive recursive thresholding. There are five vignettes including an <a href="https://cran.r-project.org/web/packages/rLifting/vignettes/introduction.html" rel="nofollow" target="_blank">Introduction</a> and <a href="https://cran.r-project.org/web/packages/rLifting/vignettes/realtime.html" rel="nofollow" target="_blank">Real-time signal smoothing</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/rLifting.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-6" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/rLifting.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot showing real-time denoising of a noisy sine wave" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=setweaver" rel="nofollow" target="_blank">setweaver</a> v1.0.0: Create sets of variables based on a mutual information. In this context, a set is a collection of distinct elements (e.g., variables) that can also be treated as a single entity. Mutual information, quantifies the dependence between two variables by expressing how much information about one variable can be gained from observing the other. See the <a href="https://cran.r-project.org/web/packages/setweaver/vignettes/setweaver.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/setweaver.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-7" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/setweaver.png?w=578&#038;ssl=1" class="img-fluid" alt="Logistic Regression Effect Probability Network" data-recalc-dims="1"></a></p>
</section>
<section id="data" class="level3">
<h3 class="anchored" data-anchor-id="data">Data</h3>
<p><a href="https://cran.r-project.org/package=geobounds" rel="nofollow" target="_blank">geobounds</a> v0.1.1: Provides tools for downloading data from <a href="https://www.geoboundaries.org/" rel="nofollow" target="_blank">geoBoundaries</a>. Several administration levels are available. See <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0231866" rel="nofollow" target="_blank">Runfola, D. et al. (2020)</a> for background and the <a href="https://cran.r-project.org/web/packages/geobounds/vignettes/geobounds.html" rel="nofollow" target="_blank">vignette</a> for examples.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/geobounds.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-8" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/geobounds.png?w=578&#038;ssl=1" class="img-fluid" alt="World Bank Income Group LatinAmerica andthe Caribbean" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=healthbR" rel="nofollow" target="_blank">healthbR</a> v0.2.0: Provides easy access to Brazilian public health data from multiple sources including VIGITEL (Surveillance of Risk Factors for Chronic Diseases by Telephone Survey), PNS (National Health Survey), PNAD Continua (Continuous National Household Sample Survey), POF (Household Budget Survey with food security and consumption data), Censo Demografico (population denominators), SIM (Mortality Information System), SINASC (Live Birth Information System) and several other repositories. Data is downloaded from the Brazilian Ministry of Health and is returned in tidy format following tidyverse conventions. There are seventeen vignettes including an <a href="https://cran.r-project.org/web/packages/healthbR/vignettes/healthbR.html" rel="nofollow" target="_blank">Introduction</a> and <a href="https://cran.r-project.org/web/packages/healthbR/vignettes/vigitel-telephone-survey.html" rel="nofollow" target="_blank">Chronic Disease Risk Factors</a>.</p>
<p><a href="https://cran.r-project.org/package=nhanesdata" rel="nofollow" target="_blank">nhanesdata</a> v4.1.0: Instant access to harmonized National Health and Nutrition Examination Survey <a href="https://www.cdc.gov/nchs/nhanes/" rel="nofollow" target="_blank">NHANES</a> data spanning 1999-2023. Retrieve pre-processed datasets from reliable cloud storage with automatic type reconciliation and integrated search tools for variables and datasets. Simplifies NHANES data workflows by handling cycle management and maintaining data consistency across survey waves. There are four vignettes including <a href="https://cran.r-project.org/web/packages/nhanesdata/vignettes/getting-started.html" rel="nofollow" target="_blank">Getting Started</a> and <a href="https://cran.r-project.org/web/packages/nhanesdata/vignettes/getting-started.html" rel="nofollow" target="_blank">Available NAHANES Datasets</a>.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/NHANES.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-9" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/NHANES.png?w=578&#038;ssl=1" class="img-fluid" alt="Plots of NHANES Age Distribution bySurvey Cycle" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=roadDB" rel="nofollow" target="_blank">roadDB</a> v1.1.0: Provides interface to the ROCEEH Out of Africa Database (<a href="https://www.roceeh.uni-tuebingen.de/roadweb/smarty_road_simple_search.php" rel="nofollow" target="_blank">ROAD</a>), a comprehensive resource for archaeological, anthropological, paleoenvironmental and geographic data from Africa and Eurasia dating from 3,000,000 to 20,000 years BP. Users can retrieve data from the online database at different levels of detail and customize search requests. See <a href="https://cran.r-project.org/web/packages/roadDB/readme/README.html" rel="nofollow" target="_blank">README</a> to get started.</p>
</section>
<section id="ecology" class="level3">
<h3 class="anchored" data-anchor-id="ecology">Ecology</h3>
<p><a href="https://cran.r-project.org/package=spacemodR" rel="nofollow" target="_blank">spacemodR</a> v0.1.3: Provides tools for modeling food web transfer based on an initial ground raster. It provides a directed acyclic graph structure for a set of rasters representing the flow of elements (e.g., food, energy, contaminants). It also includes tools for working with dispersal algorithms, enabling the combination of flux data with population movement. See the <a href="https://cran.r-project.org/web/packages/spacemodR/vignettes/Tutorial.html" rel="nofollow" target="_blank">tutorial</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/spacemodR.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-10" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/spacemodR.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of OCS GE (Occupation du Sol à Grande Échelle) data set" data-recalc-dims="1"></a></p>
</section>
<section id="epidemiology" class="level3">
<h3 class="anchored" data-anchor-id="epidemiology">Epidemiology</h3>
<p><a href="https://cran.r-project.org/package=baselinenowcast" rel="nofollow" target="_blank">baselinenowcast</a> v0.2.0: Provides nowcasting methods based on using empirical delay distributions and uncertainty from past performance as well as a baseline method for developers of new nowcasting methods. The package supports standard data frame inputs as well as the direct use of reporting triangles, and is compatible with <a href="https://www.epinowcast.org/" rel="nofollow" target="_blank">epinowcast</a> objects and accommodates a wide spectrum of reporting schedules, including mixed patterns of reference and reporting (daily-weekly, weekly-daily). For background see <a href="https://wellcomeopenresearch.org/articles/10-614" rel="nofollow" target="_blank">Johnson et al. (2026)</a>. There are five vignettes including <a href="https://cran.r-project.org/web/packages/baselinenowcast/vignettes/baselinenowcast.html" rel="nofollow" target="_blank">Getting Started</a> and <a href="https://rworks.dev/posts/Feb-2026-Top40/" rel="nofollow" target="_blank">Mathematical Methods</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/nowcast.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-11" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/nowcast.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot comparing initially reported with subsequently observed cases" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=multigroup.vaccine" rel="nofollow" target="_blank">multigroup.vaccine</a> v0.1.1: Provides functions for modeling infectious disease dynamics in populations with multiple subgroups having different vaccination rates, transmission characteristics, and contact patterns. Enables calculating outbreak sizes, automatically fetching U.S. census data, and exploring vaccination scenarios with an interactive <code>shiny</code> dashboard. See <a href="https://www.valueinhealthjournal.com/article/S1098-3015(24)00154-2/fulltext?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1098301524001542%3Fshowall%3Dtrue" rel="nofollow" target="_blank">Nguyen et al. (2024)</a> and <a href="https://academic.oup.com/ofid/article/13/Supplement_1/ofaf695.217/8420075?login=false" rel="nofollow" target="_blank">Duong et al. (2026)</a> for background. There are four vignettes including <a href="https://cran.r-project.org/web/packages/multigroup.vaccine/vignettes/run_model_on_command_line.html" rel="nofollow" target="_blank">Getting Started</a> and <a href="https://cran.r-project.org/web/packages/multigroup.vaccine/vignettes/measles_agemodel.html" rel="nofollow" target="_blank">Measles Age-Structured Model</a>.</p>
<p><a href="https://cran.r-project.org/package=ViroReportR" rel="nofollow" target="_blank">ViroReportR</a> v1.0.4: Implements tools for reporting and forecasting viral respiratory infections, using case surveillance data. Report generation tools for short-term forecasts, and validation metrics for an arbitrary number of customizable respiratory viruses. Estimation of the effective reproduction number is based on the <em>EpiEstim</em> framework described in work by <a href="https://academic.oup.com/aje/article/178/9/1505/89262?login=false" rel="nofollow" target="_blank">Cori et al. (2013)</a>. See the <a href="https://academic.oup.com/aje/article/178/9/1505/89262?login=false" rel="nofollow" target="_blank">vignette</a>.</p>
</section>
<section id="genomics" class="level3">
<h3 class="anchored" data-anchor-id="genomics">Genomics</h3>
<p><a href="https://cran.r-project.org/package=archipelago" rel="nofollow" target="_blank">archipelago</a> v0.1.0: Provides a graphical method for joint visualization of Variant Set Association Test results and individual variant association statistics. The Archipelago method assigns genomic coordinates to variant set statistics, allowing simultaneous display of variant-level and set-level signals in a unified plot supporting interpretation of both collective and individual variant contributions. For more see <a href="https://onlinelibrary.wiley.com/doi/10.1002/gepi.70025" rel="nofollow" target="_blank">Lawless et al. (2026)</a> and the <a href="https://cran.r-project.org/web/packages/archipelago/vignettes/archipelago-intro.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/archipelago.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-12" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/archipelago.png?w=578&#038;ssl=1" class="img-fluid" alt="Example of an Archipelago plot" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/web/packages/SVG/vignettes/SVG-introduction.html" rel="nofollow" target="_blank">SVG</a> v1.0.0: Implements a unified framework for detecting spatially variable genes in spatial transcriptomics data. SVG detection methods including MERINGUE (Moran’s I based spatial autocorrelation), Giotto binSpect (binary spatial enrichment test), SPARK-X (non-parametric kernel-based test), and nnSVG (nearest-neighbor Gaussian processes) which are described in <a href="https://genome.cshlp.org/content/31/10/1843" rel="nofollow" target="_blank">Miller et al. (2021)</a>, <a href="https://link.springer.com/article/10.1186/s13059-021-02286-2" rel="nofollow" target="_blank">Dries et al. (2021)</a>, <a href="https://link.springer.com/article/10.1186/s13059-021-02404-0" rel="nofollow" target="_blank">Zhu et al. (2021)</a>, and <a href="https://www.nature.com/articles/s41467-023-39748-z" rel="nofollow" target="_blank">Weber et al. (2023)</a>. See the <a href="https://cran.r-project.org/web/packages/SVG/vignettes/SVG-introduction.html" rel="nofollow" target="_blank">vignette</a> for examples.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/SVG.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-13" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/SVG.png?w=578&#038;ssl=1" class="img-fluid" alt="Spatial pattern visualization" data-recalc-dims="1"></a></p>
</section>
</div><div class="column" style="width:10%;">

</div><div class="column" style="width:45%;">
<section id="machine-learning" class="level3">
<h3 class="anchored" data-anchor-id="machine-learning">Machine Learning</h3>
<p><a href="https://cran.r-project.org/package=nadir" rel="nofollow" target="_blank">nadir</a> v0.0.1: Provides a functional programming implementation of the super learner algorithm, <a href="https://biostats.bepress.com/ucbbiostat/paper222/" rel="nofollow" target="_blank">van der Laan et al. (2007)</a>, with an emphasis on supporting the use of formulas to specify learners. Includes the ability to use random-effects specified in formulas e.g. (y ~ (age | strata) + …) and to construct new learners by passing a functions. See the <a href="https://cran.r-project.org/web/packages/nadir/vignettes/Basic-Examples.html" rel="nofollow" target="_blank">vignette</a> for basic examples.</p>
</section>
<section id="medical-applications" class="level3">
<h3 class="anchored" data-anchor-id="medical-applications">Medical Applications</h3>
<p><a href="https://cran.r-project.org/package=bfbin2arm" rel="nofollow" target="_blank">bfbin2arm</a> v0.1.0: Provides tools to design and analyze two-arm binomial clinical (phase II) trials using Bayes factors. Implements Bayes factors for point-null and directional hypotheses, predictive densities under different hypotheses, and power and sample size calibration with optional frequentist type-I error and power. See the <a href="https://cran.r-project.org/web/packages/bfbin2arm/vignettes/bfbin2arm-intro.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/bfbin2arm.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-14" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/bfbin2arm.png?w=578&#038;ssl=1" class="img-fluid" alt="Power and Type 1 Error Rate Plots" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=neuromapr" rel="nofollow" target="_blank">neuromapr</a> v0.2.1: Implements spatial null models and coordinate-space transformations for statistical comparison of brain maps, following the framework described in <a href="https://www.nature.com/articles/s41592-022-01625-w" rel="nofollow" target="_blank">Markello et al. (2022)</a>. Provides variogram-matching surrogates, Moran spectral randomization, and spin-based permutation tests. Includes an <code>R</code> interface to the <a href="https://netneurolab.github.io/neuromaps/user_guide/annotations.html" rel="nofollow" target="_blank">neuromaps</a> annotation registry for browsing, downloading, and comparing brain map annotations from the Open Science Framework. There are five vignettes including <a href="https://cran.r-project.org/web/packages/neuromapr/vignettes/neuromapr.html" rel="nofollow" target="_blank">Getting Started</a> and <a href="https://cran.r-project.org/web/packages/neuromapr/vignettes/surface-geometry.html" rel="nofollow" target="_blank">Surface Geometry</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/neuromapr.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-15" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/neuromapr.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of Null Correlation Distributions" data-recalc-dims="1"></a></p>
</section>
<section id="physics" class="level3">
<h3 class="anchored" data-anchor-id="physics">Physics</h3>
<p><a href="https://cran.r-project.org/package=HaDeX2" rel="nofollow" target="_blank">HaDeX2</a> v1.0.0: Process, analyze and visualize Hydrogen Deuterium eXchange monitored by Mass Spectrometry experiments (HDX-MS) via a new reproducible workflow for the analysis of the HDX-MS data that includes uncertainty propagation, data aggregation and visualization on 3D structure, functions for data exploration, quality control and generation of publication-quality figures and a companion <code>Shiny</code> application. There are eleven vignettes including <a href="https://cran.r-project.org/web/packages/HaDeX2/vignettes/datafiles.html" rel="nofollow" target="_blank">Calculations</a> and <a href="https://cran.r-project.org/web/packages/HaDeX2/vignettes/visualization.html" rel="nofollow" target="_blank">Data Visualization</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/HaDeX2.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-16" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/HaDeX2.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of Deuterium uptake vs Peptide ID" data-recalc-dims="1"></a></p>
</section>
<section id="statistics" class="level3">
<h3 class="anchored" data-anchor-id="statistics">Statistics</h3>
<p><a href="https://cran.r-project.org/package=BCFM" rel="nofollow" target="_blank">BCFM</a> v1.0.0: Implements the Bayesian Clustering Factor Models for simultaneous clustering and latent factor analysis of multivariate longitudinal data. The model accounts for within-cluster dependence through shared latent factors while allowing heterogeneity across clusters, enabling flexible covariance modeling in high-dimensional settings. The methodology is described in <a href="https://onlinelibrary.wiley.com/doi/10.1002/sim.70350" rel="nofollow" target="_blank">Shin, Ferreira, and Tegge (2018)</a>. See the <a href="https://cran.r-project.org/web/packages/BCFM/vignettes/introduction-to-BCFM.html" rel="nofollow" target="_blank">vignette</a> for examples.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/BCFM.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-17" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/BCFM.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of posterior densities for cluster probabilities" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=clmstan" rel="nofollow" target="_blank">clmstan</a> v0.1.1: Provides functions to fit cumulative link models for ordinal categorical data using <code>CmdStanR</code>. Supports various link functions including logit, probit, cloglog, loglog, cauchit, and flexible parametric links such as Generalized Extreme Value, Asymmetric Exponential Power, and Symmetric Power. Methods are described in <a href="https://onlinelibrary.wiley.com/doi/10.1111/j.1467-842X.2011.00601.x" rel="nofollow" target="_blank">Agresti (2010)</a>, <a href="https://link.springer.com/article/10.1007/s10651-010-0154-8" rel="nofollow" target="_blank">Wang and Dey (2011)</a>, and <a href="https://dl.acm.org/doi/abs/10.1007/s11222-014-9449-1" rel="nofollow" target="_blank">Naranjo, Perez, and Martin (2015)</a>. See the <a href="https://cran.r-project.org/web/packages/clmstan/vignettes/getting-started.html" rel="nofollow" target="_blank">vignette</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=dtms" rel="nofollow" target="_blank">dtms</a> v0.4.2: Implements discrete-time multistate models, several ways of estimating parametric and nonparametric multistate models, and an extensive set of Markov chain methods which use transition probabilities derived from the multistate model. See <a href="https://www.tandfonline.com/doi/full/10.1080/00324728.2023.2176535" rel="nofollow" target="_blank">Schneider et al. (2024)</a>, <a href="https://journals.sagepub.com/doi/10.1177/0049124118782541" rel="nofollow" target="_blank">Dudel (2021)</a>, <a href="https://link.springer.com/article/10.1186/s12963-020-00217-0" rel="nofollow" target="_blank">Dudel &#038; Myrskylä (2020)</a>, and <a href="https://www.taylorfrancis.com/books/mono/10.1201/9781315374321/multi-state-survival-models-interval-censored-data-ardo-van-den-hout" rel="nofollow" target="_blank">van den Hout (2017)</a> for background and <a href="https://cran.r-project.org/web/packages/dtms/readme/README.html" rel="nofollow" target="_blank">README</a> to get started.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/dtms.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-18" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/dtms.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of evolution of transition probabilities" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=GAReg" rel="nofollow" target="_blank">GAReg</a> v0.1.0: Provides a genetic algorithm framework for regression problems requiring discrete optimization over model spaces with unknown or varying dimension, where gradient-based methods and exhaustive enumeration are impractical. The computation is built on the <em>GA</em> engine of <a href="https://journal.r-project.org/articles/RJ-2017-008/index.html" rel="nofollow" target="_blank">Scrucca (2017)</a>, and <em>changepointGA</em> engine from <a href="https://arxiv.org/abs/2410.15571" rel="nofollow" target="_blank">Li and Lu (2024)</a>. In challenging high-dimensional settings, functions enable efficient search and delivers near-optimal solutions. See the <a href="https://cran.r-project.org/web/packages/GAReg/vignettes/vignette.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/GAReg.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-19" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/GAReg.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot showing spline options" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=mixpower" rel="nofollow" target="_blank">mixpower</a> v0.1.0: Implements a simulation-based toolkit for power and sample-size analysis for linear and generalized linear mixed-effects models (LMMs and GLMMs). Supports Gaussian, binomial, Poisson, and negative binomial families via <code>lme4</code>; Wald and likelihood-ratio tests; multi-parameter sensitivity grids; power curves and minimum sample-size solvers; parallel evaluation with deterministic seeds; and functions for reproducibility. Run time diagnostics include failure rate, singular-fit rate, effective N and publication-ready summary tables. There are five brief vignettes including an <a href="https://cran.r-project.org/web/packages/mixpower/vignettes/mixpower-intro.html" rel="nofollow" target="_blank">Introduction</a> and <a href="https://cran.r-project.org/web/packages/mixpower/vignettes/mixpower-simulations.html" rel="nofollow" target="_blank">Running simulations</a>.</p>
<p><a href="https://cran.r-project.org/package=rblimp" rel="nofollow" target="_blank">rblimp</a> v1.0.: Provides an interface to <a href="https://www.appliedmissingdata.com/blimp" rel="nofollow" target="_blank"><code>Blimp</code></a> software for Bayesian latent variable modeling, missing data analysis, and multiple imputation. The package generates <code>Blimp</code> syntax, executes <code>Blimp</code> models, and imports results back into <code>R</code> as structured objects with methods for visualization and analysis. See <a href="https://cran.r-project.org/web/packages/rblimp/readme/README.html" rel="nofollow" target="_blank">README</a> to get started.</p>
<p><a href="https://cran.r-project.org/package=rareflow" rel="nofollow" target="_blank">rareflow</a> v0.1.0: Provides variational flow-based methods for modeling rare events using Kullback–Leibler divergence, normalizing flows, Girsanov change of measure, and Freidlin–Wentzell action functionals and tools for rare-event inference, minimum-action paths, and quasi-potential computation in stochastic dynamical systems. Methods are based on <a href="https://arxiv.org/abs/1505.05770" rel="nofollow" target="_blank">Rezende and Mohamed (2015)</a>, <a href="https://epubs.siam.org/doi/10.1137/1105027" rel="nofollow" target="_blank">Girsanov (1960)</a>, and <a href="https://link.springer.com/book/10.1007/978-3-642-25847-3" rel="nofollow" target="_blank">Freidlin and Wentzell (2012)</a>. See the <a href="https://cran.r-project.org/web/packages/rareflow/vignettes/rareflow.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/rareflow.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-20" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/rareflow.png?w=578&#038;ssl=1" class="img-fluid" alt="2D potential plot" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=sshist" rel="nofollow" target="_blank">sshist</a> v0.1.3: Implements the Shimazaki-Shinomoto method for optimizing the bin width of a histogram. This method minimizes the mean integrated squared error and features a <code>C++</code> backend for high performance and shift-averaging to remove edge-position bias. Ideally suited for time-dependent rate estimation and identifying intrinsic data structures. Supports both 1D and 2D data distributions. See <a href="https://direct.mit.edu/neco/article-abstract/19/6/1503/7188/A-Method-for-Selecting-the-Bin-Size-of-a-Time?redirectedFrom=fulltext" rel="nofollow" target="_blank">Shimazaki and Shinomoto (2007)</a> for more details and the <a href="https://cran.r-project.org/web/packages/sshist/vignettes/introduction.html" rel="nofollow" target="_blank">vignette</a> for an introduction.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/sshist.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-21" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/sshist.png?w=578&#038;ssl=1" class="img-fluid" alt="Plots showing optimal histogram for cost minimization" data-recalc-dims="1"></a></p>
</section>
<section id="surveys" class="level3">
<h3 class="anchored" data-anchor-id="surveys">Surveys</h3>
<p><a href="https://cran.r-project.org/package=heaping" rel="nofollow" target="_blank">heaping</a> v0.1.0: Provides methods for correcting heaping (digit preference) in survey data at the individual record level. Age heaping, where respondents disproportionately report ages ending in 0 or 5, is a common phenomenon that can distort demographic analyses. Unlike traditional smoothing methods that only correct aggregated statistics, this package corrects individual values by replacing a calculated proportion of heaped observations with draws from fitted truncated distributions. See the <a href="https://cran.r-project.org/web/packages/heaping/vignettes/heaping-intro.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/heaping.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-22" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/heaping.png?w=578&#038;ssl=1" class="img-fluid" alt="Plots showing corrections for heaping" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=metasurvey" rel="nofollow" target="_blank">metasurvey</a> v0.0.21: Provides a step-based pipeline for reproducible survey data processing, building on the <code>survey</code> package for complex sampling designs. Supports rotating panels with bootstrap replicate weights, and provides a recipe system for sharing and reproducing data transformation workflows across survey editions. There are thirteen vignettes including <a href="https://cran.r-project.org/web/packages/metasurvey/vignettes/getting-started.html" rel="nofollow" target="_blank">Getting Started</a> and <a href="https://cran.r-project.org/web/packages/metasurvey/vignettes/complex-designs.html" rel="nofollow" target="_blank">Survey design and Validation</a>.</p>
</section>
<section id="time-series" class="level3">
<h3 class="anchored" data-anchor-id="time-series">Time Series</h3>
<p><a href="https://cran.r-project.org/package=mhpfilter" rel="nofollow" target="_blank">mhpfilter</a> v0.1.0: Implements the Modified Hodrick-Prescott Filter for decomposing macroeconomic time series into trend and cyclical components via efficient <code>C++</code> routines. Unlike the standard HP filter, functions estimate series-specific lambda values that minimize the GCV criterion. See <a href="https://www.tandfonline.com/doi/abs/10.1080/00036846.2014.894631" rel="nofollow" target="_blank">Choudhary, Hanif and Iqbal (2014)</a>, and <a href="https://www.elibrary.imf.org/view/journals/024/1997/001/article-A003-en.xml" rel="nofollow" target="_blank">Coe and McDermott (1997)</a> for background. There is an <a href="https://cran.r-project.org/web/packages/mhpfilter/vignettes/introduction.html" rel="nofollow" target="_blank">Introduction</a> and a vignette on <a href="https://cran.r-project.org/web/packages/mhpfilter/vignettes/methodology.html" rel="nofollow" target="_blank">Modified HP Filter Theory</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/mhpfilter.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-23" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/mhpfilter.png?w=578&#038;ssl=1" class="img-fluid" alt="Plots showing effects of lambda" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=RegimeChange" rel="nofollow" target="_blank">RegimeChange</a> v0.1.1: Implements a unified framework for detecting regime changes (changepoints) in time series data that includes both frequentist and Bayesian methods for univariate and multivariate series with detection of changes in mean, variance, trend, and distributional properties. See <a href="https://academic.oup.com/biomet/article-abstract/41/1-2/100/456627?redirectedFrom=fulltext&#038;login=false" rel="nofollow" target="_blank">Page (1954)</a>, <a href="https://www.tandfonline.com/doi/full/10.1080/01621459.2012.737745" rel="nofollow" target="_blank">Killick, Fearnhead, and Eckley (2012)</a> for frequentist methods and <a href="https://arxiv.org/abs/0710.3742" rel="nofollow" target="_blank">Adams and MacKay (2007)</a>. for Bayesian methods. There are three vignettes including and <a href="https://cran.r-project.org/web/packages/RegimeChange/vignettes/introduction.html" rel="nofollow" target="_blank">Introduction</a> and <a href="https://cran.r-project.org/web/packages/RegimeChange/vignettes/bayesian-methods.html" rel="nofollow" target="_blank">Bayesian Changepoint Detection</a>.</p>
<p><a href="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/RegimeChange.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-24" rel="nofollow" target="_blank"><img src="https://i0.wp.com/rworks.dev/posts/Feb-2026-Top40/RegimeChange.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of changepoint detection with PELT" data-recalc-dims="1"></a></p>
</section>
<section id="utilities" class="level3">
<h3 class="anchored" data-anchor-id="utilities">Utilities</h3>
<p><a href="https://cran.r-project.org/package=birddog" rel="nofollow" target="_blank">birddog</a> v1.0.0: Provides a unified set of methods to detect scientific emergence and technological trajectories in academic papers and patents by combining citation network analysis with community detection and attribute extraction, also applying natural language processing and structural topic modeling to uncover the contents of research communities. Applications of the method include: <a href="https://regepe.org.br/regepe/article/view/1742" rel="nofollow" target="_blank">Souza et al. (2022)</a> and <a href="https://www.mdpi.com/2071-1050/15/2/967" rel="nofollow" target="_blank">Maria et al. (2023)</a>. See <a href="https://cran.r-project.org/web/packages/birddog/readme/README.html" rel="nofollow" target="_blank">README</a> for the methodology and look <a href="https://roneyfraga.com/birddog/articles/introduction_birddog.html" rel="nofollow" target="_blank">here</a> for an introduction and examples.</p>
<p><a href="https://rworks.dev/posts/Feb-2026-Top40/birddog.svg" class="lightbox" data-gallery="quarto-lightbox-gallery-25" rel="nofollow" target="_blank"><img src="https://rworks.dev/posts/Feb-2026-Top40/birddog.svg" class="img-fluid" alt="Methodology Workflow"></a></p>
<p><a href="https://cran.r-project.org/package=phinterval" rel="nofollow" target="_blank">phinterval</a> v1.0.0: Implements the phinterval vector class for representing time spans that may contain gaps (disjoint intervals) or be empty. This class generalizes the <code>lubridate</code> package’s interval class to support vectorized set operations (intersection, union, difference, complement) that always return a valid time span, even when disjoint or empty intervals are created. See the <a href="https://cran.r-project.org/web/packages/phinterval/vignettes/phinterval.html" rel="nofollow" target="_blank">vignette</a>.</p>
</section>
<section id="visualization" class="level3">
<h3 class="anchored" data-anchor-id="visualization">Visualization</h3>
<p><a href="https://cran.r-project.org/package=dtGAP" rel="nofollow" target="_blank">dtGAP</a> v0.0.2: Provides supervised generalized association plots based on decision trees and enhances decision tree visualization by incorporating Generalized Association Plots through matrix-based visualizations including confusion matrix maps, decision tree matrix maps, and predicted class membership maps. See <a href="https://cran.r-project.org/web/packages/dtGAP/readme/README.html" rel="nofollow" target="_blank">README</a>.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/dtGAP.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-26" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/dtGAP.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of decision tree with heatmap" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=ggInterval" rel="nofollow" target="_blank">ggInterval</a> v0.2.4: Extends <code>ggplot2</code> to visualize symbolic interval-valued data with various plots via more general and flexible input arguments, and provides a function to transform classical data into symbolic data using both clustering algorithms and customized methods. See the <a href="https://cran.r-project.org/web/packages/ggInterval/vignettes/ggInterval_Intro.html" rel="nofollow" target="_blank">vignette</a>.</p>
<p><a href="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/ggInterval.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-27" rel="nofollow" target="_blank"><img src="https://i2.wp.com/rworks.dev/posts/Feb-2026-Top40/ggInterval.png?w=578&#038;ssl=1" class="img-fluid" alt="Plot of Index Image-Column Condition" data-recalc-dims="1"></a></p>
<p><a href="https://cran.r-project.org/package=nomiShape" rel="nofollow" target="_blank">nomiShape</a> v1.0.1: Provides tools for visualizing and analyzing the shape of discrete nominal frequency distributions and introduces centered frequency plots, in which nominal categories are ordered from the most frequent category at the center toward less frequent categories on both sides, facilitating the detection of distributional patterns such as uniformity, dominance, symmetry, skewness, and long-tail behavior. In addition, the package supports Pareto charts for the study of dominance and cumulative frequency structure in nominal data. There are twelve vignettes including <a href="https://cran.r-project.org/web/packages/nomiShape/vignettes/nominal_distribution_shapes.html" rel="nofollow" target="_blank">Visualizing and Analyzing Distributions of Nominal Variables</a> and <a href="https://cran.r-project.org/web/packages/nomiShape/vignettes/pareto.html" rel="nofollow" target="_blank">Pareto Plots for Nominal Distributions</a>.</p>
<p><a href="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/nomiShape.png?ssl=1" class="lightbox" data-gallery="quarto-lightbox-gallery-28" rel="nofollow" target="_blank"><img src="https://i1.wp.com/rworks.dev/posts/Feb-2026-Top40/nomiShape.png?w=578&#038;ssl=1" class="img-fluid" alt="Example of a Pareto Plot" data-recalc-dims="1"></a></p>
</section>
</div>
</div>



 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rworks.dev/posts/Feb-2026-Top40/"> R Works</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/february-2026-top-40-new-cran-packages/">February 2026 Top 40 New CRAN Packages</a>]]></content:encoded>
					
		
		<enclosure url="https://rworks.dev/posts/Feb-2026-Top40/archipelago.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">400109</post-id>	</item>
		<item>
		<title>You shall know a word by the company it keeps — so choose your prompts wisely</title>
		<link>https://www.r-bloggers.com/2026/03/you-shall-know-a-word-by-the-company-it-keeps-so-choose-your-prompts-wisely/</link>
		
		<dc:creator><![CDATA[Pablo Bernabeu]]></dc:creator>
		<pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/</guid>

					<description><![CDATA[<p>In computational linguistics, word meanings are shaped by their contexts. As the British linguist John Rupert Firth put it in 1957, ‘You shall know a word by the company it keeps’ (see Brunila &#038; LaViolette, 2022, for a re-examination of the intellectual history). It sounds almost like life advice, but Firth meant something ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/you-shall-know-a-word-by-the-company-it-keeps-so-choose-your-prompts-wisely/">You shall know a word by the company it keeps — so choose your prompts wisely</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/"> Pablo Bernabeu</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/clipboard/clipboard.min.js"></script>
<link href="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/xaringanExtra-clipboard/xaringanExtra-clipboard.css" rel="stylesheet" />
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/xaringanExtra-clipboard/xaringanExtra-clipboard.js"></script>
<script>window.xaringanExtraClipboard(null, {"button":"Copy Code","success":"Copied!","error":"Press Ctrl+C to Copy"})</script>
<link href="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/htmltools-fill/fill.css" rel="stylesheet" />
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/htmlwidgets/htmlwidgets.js"></script>
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/plotly-binding/plotly.js"></script>
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/typedarray/typedarray.min.js"></script>
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/jquery/jquery.min.js"></script>
<link href="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/crosstalk/css/crosstalk.min.css" rel="stylesheet" />
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/crosstalk/js/crosstalk.min.js"></script>
<link href="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/plotly-htmlwidgets-css/plotly-htmlwidgets.css" rel="stylesheet" />
<script src="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/index.en_files/plotly-main/plotly-latest.min.js"></script>


<style type="text/css">
.figure p.caption, figcaption {
  text-align: left;
}
</style>
<script>
// Collapse code chunks by default in this post
document.addEventListener("DOMContentLoaded", function() {
  document.querySelectorAll("details").forEach(function(d) {
    d.open = false;
    var s = d.querySelector("summary");
    if (s) {
      s.textContent = "Expand";
      s.style.fontWeight = "bold";
      s.style.fontSize = "103%";
      s.style.color = "#379E8A";
    }
  });
});
</script>
<p>In computational linguistics, word meanings are shaped by their contexts. As the British linguist John Rupert Firth put it in 1957, ‘You shall know a word by the company it keeps’ (see <a href="https://doi.org/10.18653/v1/2022.naacl-main.327" rel="nofollow" target="_blank">Brunila &#038; LaViolette, 2022</a>, for a re-examination of the intellectual history). It sounds almost like life advice, but Firth meant something technical: words that habitually appear alongside each other tend to share semantic territory. The adjective ‘good’, for instance, is far more likely to appear near ‘kind’, ‘genuine’, ‘fair’ and ‘quality’ than near ‘broken’ or ‘fraud’ – and a model that tracks those neighbours can learn what ‘good’ means without ever being told. The principle extends to polysemy: ‘bank’ means something entirely different in the company of ‘river’ and ‘fishing rod’ than in the company of ‘overdraft’ and ‘mortgage’. Context is everything.</p>
<p>This deceptively simple insight is the bedrock on which generative AI was built. The earliest computational implementations of Firth’s principle – distributional semantic models such as Latent Semantic Analysis (LSA; <a href="https://doi.org/10.1037/0033-295X.104.2.211" rel="nofollow" target="_blank">Landauer &#038; Dumais, 1997</a>) and the Hyperspace Analogue to Language (<a href="https://doi.org/10.3758/BF03204766" rel="nofollow" target="_blank">Lund &#038; Burgess, 1996</a>) – were modest by today’s standards: a matrix of word co-occurrence counts, a few hundred latent dimensions and a vocabulary of perhaps tens of thousands of words. Yet even these pocket-sized models captured real-world structure with startling fidelity. <a href="https://doi.org/10.1111/j.1551-6709.2008.01003.x" rel="nofollow" target="_blank">Louwerse and Zwaan (2009)</a> showed that the frequency with which city names co-occur in English text predicts their actual geographical distances: cities close together on a map tend to be mentioned together more often, and an LSA model trained on text alone can reconstruct approximate maps of the United States without ever seeing one. <a href="https://doi.org/10.1111/j.1756-8765.2010.01106.x" rel="nofollow" target="_blank">Louwerse (2011)</a> extended this further, showing that text statistics encode not just geography but sensory properties, emotional associations and conceptual relationships across a wide range of domains. Indeed, distributional language statistics may track some sensorimotor properties of concepts (<a href="https://doi.org/10.17635/lancaster/thesis/1795" rel="nofollow" target="_blank">Bernabeu, 2022</a>; <a href="https://doi.org/10.1111/j.1551-6709.2010.01157.x" rel="nofollow" target="_blank">Louwerse &#038; Connell, 2011</a>; cf. <a href="https://doi.org/10.1038/s41562-025-02203-8" rel="nofollow" target="_blank">Xu et al., 2025</a>), especially after fine-tuning on human sensorimotor ratings (<a href="https://doi.org/10.48550/arXiv.2603.03313" rel="nofollow" target="_blank">Wu et al., 2026</a>). In short, language does not merely label the world – it encodes its structure, and even a simple co-occurrence model can read that encoding back.</p>
<p>We can see this for ourselves. The R code included below (click ‘Expand’ to view it) applies LSA – one of the simplest distributional models – to three text collections, projects the resulting word vectors into two dimensions via PCA (principal component analysis) and plots them. In brief, LSA builds a <em>term-document matrix</em> (a large table recording how often each word appears in each document), weights it with TF-IDF (term frequency–inverse document frequency, which highlights words distinctive to particular documents rather than ubiquitous everywhere) and then compresses it via <em>truncated SVD</em> (singular value decomposition, a form of dimensionality reduction). Each corpus is split into two groups: the most <em>distinctive</em> words per group (selected by the difference in mean TF-IDF weight between groups) are plotted in the group’s colour, while the most frequent <em>shared</em> words appear in purple. Words that co-occur in similar contexts cluster together; words from different domains drift apart.</p>
<p>PCA works by finding new axes – principal components – that capture the maximum variance in the data. Each word receives a <em>loading</em> on each component: a number ranging from −1 to +1 that indicates how strongly that word contributes to that axis of variation (a gentle introduction to PCA in R is <a href="https://pablobernabeu.github.io/2018/naive-principal-component-analysis-in-r" rel="nofollow" target="_blank">available in an earlier post on this blog</a>). High absolute loadings on a component mean that the word is a strong marker of the distinction that component captures.</p>
<p>How are the thematic groups decided? The code computes the mean TF-IDF weight of every word in each group of documents and then takes the difference. Words whose weight is much higher in group A than in group B are classified as distinctive to A, and vice versa. The top 15 words at each extreme become the coloured labels in the plot, while the most frequent words that do not belong to either extreme are labelled ‘Shared’. The grouping is therefore entirely data-driven: no human decides which words are ‘finance’ or ‘energy’ – the corpus statistics do. Above each plot, a table shows the mean loading of each thematic group on the first two principal components, with the highest positive loading per group highlighted in bold. A high absolute loading tells us that a given group of words is strongly aligned with that component – in other words, that the component captures precisely the distinction between those groups. When one group loads heavily on PC1 while another does not, the first principal component is essentially the axis that separates them.</p>
<div id="reuters-newswire-finance-vs-energy" class="section level3">
<h3>Reuters Newswire: Finance vs Energy</h3>
<p>The first corpus uses two classic newswire collections from the <code>tm</code> package (<a href="https://doi.org/10.18637/jss.v025.i05" rel="nofollow" target="_blank">Feinerer et al., 2008</a>): <code>acq</code> (50 Reuters articles on corporate acquisitions) and <code>crude</code> (20 articles on crude oil markets). Both have been standard NLP benchmarks since the 1980s (<a href="http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html" rel="nofollow" target="_blank">Lewis, 1997</a>). The code builds a TF-IDF weighted term-document matrix, reduces it to a 20-dimensional LSA space via truncated SVD, and computes pairwise cosine similarities – a standard measure of how close two word vectors sit, on a scale from –1 (opposite) to +1 (identical) – using <code>LSAfun::Cosine()</code> (<a href="https://doi.org/10.3758/s13428-015-0662-x" rel="nofollow" target="_blank">Günther et al., 2016</a>). The PCA loadings table and word-vector plot below show the results.</p>
<pre>pkgs &lt;- c(&quot;LSAfun&quot;, &quot;tm&quot;, &quot;ggplot2&quot;, &quot;plotly&quot;)
invisible(lapply(pkgs, function(p)
  if (!requireNamespace(p, quietly = TRUE)) install.packages(p)))
library(LSAfun)
library(tm)
library(ggplot2)
library(plotly)

# --- Reusable helper: LSA + PCA plot ------------------------------------
# Builds a TF-IDF term-document matrix, computes a truncated SVD,
# selects the most distinctive and most shared words and projects them
# to 2D via PCA *on the selected words only* for maximum spread.

lsa_pipeline &lt;- function(doc_list, labels, grp_a, grp_b,
                         lab_a, lab_b, colour_a, colour_b,
                         top_n = 15, n_shared = 10,
                         k = 20, min_docs = 4) {
  corp &lt;- VCorpus(VectorSource(doc_list))
  corp &lt;- tm_map(corp, content_transformer(tolower))
  corp &lt;- tm_map(corp, removePunctuation)
  corp &lt;- tm_map(corp, removeNumbers)
  corp &lt;- tm_map(corp, removeWords, stopwords(&quot;en&quot;))
  corp &lt;- tm_map(corp, stripWhitespace)
  tdm  &lt;- as.matrix(TermDocumentMatrix(corp,
             control = list(weighting = weightTfIdf,
                            bounds = list(global = c(min_docs, Inf)))))
  k_use &lt;- min(as.integer(k), nrow(tdm) - 1L, ncol(tdm) - 1L)
  sv    &lt;- svd(tdm, nu = k_use, nv = k_use)
  wlsa  &lt;- sv$u %*% diag(sv$d[1:k_use])
  rownames(wlsa) &lt;- rownames(tdm)
  idx_a  &lt;- which(labels == grp_a)
  idx_b  &lt;- which(labels == grp_b)
  mean_a &lt;- rowMeans(tdm[, idx_a, drop = FALSE])
  mean_b &lt;- rowMeans(tdm[, idx_b, drop = FALSE])
  total  &lt;- mean_a + mean_b
  spec   &lt;- mean_a - mean_b           # positive = distinctive to A
  top_a  &lt;- names(sort(spec, decreasing = TRUE))[1:top_n]
  top_b  &lt;- names(sort(spec, decreasing = FALSE))[1:top_n]
  shared_pool &lt;- setdiff(names(sort(total, decreasing = TRUE)),
                         c(top_a, top_b))
  shared &lt;- head(shared_pool, n_shared)
  hl   &lt;- unique(c(top_a, top_b, shared))
  hl   &lt;- hl[hl %in% rownames(wlsa)]
  # PCA on the selected words only, for better spatial spread
  wlsa_hl &lt;- wlsa[hl, , drop = FALSE]
  pca  &lt;- prcomp(wlsa_hl, scale. = FALSE)
  cd   &lt;- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2],
                     word = rownames(wlsa_hl))
  cd$topic &lt;- ifelse(cd$word %in% top_a &#038; !cd$word %in% top_b, lab_a,
              ifelse(cd$word %in% top_b &#038; !cd$word %in% top_a, lab_b,
                     &quot;Shared&quot;))
  p &lt;- ggplot(cd, aes(PC1, PC2, colour = topic,
                      text = paste0(word, &quot; (&quot;, topic, &quot;)&quot;))) +
    geom_point(size = 0, alpha = 0) +
    scale_colour_manual(values = setNames(c(colour_a, colour_b, &quot;#7B2D8E&quot;),
                                          c(lab_a, lab_b, &quot;Shared&quot;)),
                        guide = guide_legend(override.aes = list(size = 3, alpha = 1))) +
    labs(x = &quot;Principal Component 1&quot;, y = &quot;Principal Component 2&quot;,
         colour = NULL) +
    theme_minimal(base_size = 12) +
    theme(legend.position = &quot;bottom&quot;,
          legend.margin   = margin(t = -5),
          axis.title.x    = element_text(margin = margin(t = 12)),
          axis.title.y    = element_text(margin = margin(r = 12)),
          plot.margin     = margin(0, 0, 0, 0))

  # Map each word to its group colour for label text
  col_map &lt;- setNames(c(colour_a, colour_b, &quot;#7B2D8E&quot;),
                      c(lab_a, lab_b, &quot;Shared&quot;))
  cd$label_col &lt;- col_map[cd$topic]

  # Trim spatial outliers so the dense cluster is readable.
  # Words beyond the IQR fence are dropped from the plot (not from LSA).
  q1  &lt;- quantile(cd$PC1, 0.25); q3 &lt;- quantile(cd$PC1, 0.75)
  iqr &lt;- q3 - q1; fence &lt;- 2.5
  keep &lt;- cd$PC1 &gt;= (q1 - fence * iqr) & cd$PC1 &lt;= (q3 + fence * iqr)
  q1y &lt;- quantile(cd$PC2, 0.25); q3y &lt;- quantile(cd$PC2, 0.75)
  iqry &lt;- q3y - q1y
  keep &lt;- keep &#038; cd$PC2 &gt;= (q1y - fence * iqry) & cd$PC2 &lt;= (q3y + fence * iqry)
  cd &lt;- cd[keep, , drop = FALSE]

  pp &lt;- ggplotly(p, tooltip = &quot;text&quot;)
  # Hide all ggplot traces from plot AND legend
  for (k in seq_along(pp$x$data)) {
    pp$x$data[[k]]$marker$size    &lt;- 0.1
    pp$x$data[[k]]$marker$opacity &lt;- 0
    pp$x$data[[k]]$showlegend &lt;- FALSE
  }
  # Constrain axes to the data range (with a small pad)
  pad_x &lt;- diff(range(cd$PC1)) * 0.06
  pad_y &lt;- diff(range(cd$PC2)) * 0.06
  # Add text traces per group (toggleable via legend)
  legend_groups &lt;- c(lab_a, lab_b, &quot;Shared&quot;)
  legend_cols   &lt;- c(colour_a, colour_b, &quot;#7B2D8E&quot;)
  offscreen_x &lt;- max(cd$PC1) + pad_x * 50
  offscreen_y &lt;- max(cd$PC2) + pad_y * 50
  for (i in seq_along(legend_groups)) {
    grp &lt;- legend_groups[i]
    grp_data &lt;- cd[cd$topic == grp, , drop = FALSE]
    if (nrow(grp_data) == 0) next
    # Text trace at actual positions (no legend entry)
    pp &lt;- pp %&gt;% add_trace(
      x = grp_data$PC1, y = grp_data$PC2,
      type = &quot;scatter&quot;, mode = &quot;text&quot;,
      text = grp_data$word,
      textfont = list(size = 11, color = legend_cols[i]),
      name = grp, legendgroup = grp, showlegend = FALSE,
      hoverinfo = &quot;text&quot;,
      hovertext = paste0(grp_data$word, &quot; (&quot;, grp, &quot;)&quot;),
      inherit = FALSE
    )
    # Legend-only marker trace (off-screen, linked via legendgroup)
    pp &lt;- pp %&gt;% add_trace(
      x = offscreen_x, y = offscreen_y, type = &quot;scatter&quot;, mode = &quot;markers&quot;,
      marker = list(size = 12, color = legend_cols[i], opacity = 1,
                    symbol = &quot;circle&quot;),
      name = grp, legendgroup = grp, showlegend = TRUE,
      hoverinfo = &quot;skip&quot;, inherit = FALSE
    )
  }
  pp &lt;- pp %&gt;% layout(
    legend = list(orientation = &quot;h&quot;, x = 1, xanchor = &quot;right&quot;,
                  y = -0.12, tracegroupgap = 4, itemwidth = 30,
                  itemsizing = &quot;constant&quot;,
                  font = list(size = 12),
                  bordercolor = &quot;#CCCCCC&quot;, borderwidth = 1,
                  bgcolor = &quot;#FAFAFA&quot;,
                  xpad = 4, ypad = 10),
    xaxis = list(title = list(text = &quot;Principal Component 1&quot;,
                              standoff = 8),
                 range = c(min(cd$PC1) - pad_x, max(cd$PC1) + pad_x)),
    yaxis = list(title = list(text = &quot;Principal Component 2&quot;,
                              standoff = 8),
                 range = c(min(cd$PC2) - pad_y, max(cd$PC2) + pad_y)),
    margin = list(b = 60)
  )
  list(plot = pp, lsa = wlsa, tdm = tdm, pca = pca, words = cd)
}

# --- 1. Reuters newswire ------------------------------------------------
data(acq)
data(crude)

docs   &lt;- c(lapply(acq, content), lapply(crude, content))
labels &lt;- c(rep(&quot;acq&quot;, length(acq)), rep(&quot;crude&quot;, length(crude)))

res1 &lt;- lsa_pipeline(docs, labels,
  grp_a = &quot;acq&quot;, grp_b = &quot;crude&quot;,
  lab_a = &quot;Finance&quot;, lab_b = &quot;Energy&quot;,
  colour_a = &quot;#D55E00&quot;, colour_b = &quot;#0072B2&quot;,
  min_docs = 4)

# Cosine similarities in the 20-dimensional LSA space
pairs &lt;- list(
  c(&quot;oil&quot;, &quot;barrel&quot;), c(&quot;shares&quot;, &quot;acquisition&quot;),
  c(&quot;price&quot;, &quot;barrel&quot;), c(&quot;price&quot;, &quot;shares&quot;),
  c(&quot;shares&quot;, &quot;oil&quot;), c(&quot;acquisition&quot;, &quot;barrel&quot;))
pairs &lt;- Filter(function(p) all(p %in% rownames(res1$lsa)), pairs)
sims  &lt;- sapply(pairs, function(p)
  round(Cosine(p[1], p[2], tvectors = res1$lsa), 3))
names(sims) &lt;- sapply(pairs, paste, collapse = &quot; ~ &quot;)
sims
#&gt;         oil ~ barrel shares ~ acquisition       price ~ barrel 
#&gt;                0.675                0.236                0.938 
#&gt;       price ~ shares         shares ~ oil acquisition ~ barrel 
#&gt;                0.129               -0.014               -0.043</pre>
<table>
<caption><span id="tab:loadings-reuters">Table 1: </span><em>Mean PCA Loadings on the First Two Components (Highest Positive Loading per Group in Bold, Excluding Shared)</em></caption>
<thead>
<tr class="header">
<th align="left">Group</th>
<th align="right">PC1</th>
<th align="right">PC2</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Energy</td>
<td align="right"><strong>.439</strong></td>
<td align="right">-.321</td>
</tr>
<tr class="even">
<td align="left">Finance</td>
<td align="right">-.265</td>
<td align="right"><strong>.24</strong></td>
</tr>
<tr class="odd">
<td align="left">Shared</td>
<td align="right">.172</td>
<td align="right">.069</td>
</tr>
</tbody>
</table>
<div class="figure" style="text-align: center"><span style="display:block;" id="fig:plot-reuters"></span>
<div class="plotly html-widget html-fill-item" id="htmlwidget-1" style="width:768px;height:576px;"></div>
<script type="application/json" data-for="htmlwidget-1">{"x":{"data":[{"x":[0.28836086654331949,0.1657808941561569,0.24753346914792965,0.15005897875816435,0.098896621624986067,0.14711428780722954,0.17444498328263369,0.10607858018124708,0.08051017201028017,0.086010212199783401,0.064786099141849379,0.10008936204826478,0.07478606229021495,0.10014917224483195,0.063693207340012561],"y":[-0.24832225144148024,-0.17896550697907548,-0.15490496801074929,-0.1413326394049457,-0.095827323182578472,-0.16973391719486938,0.58180820005241962,-0.11247257380227436,-0.046945776019420869,0.19094653091790281,-0.057862022883080608,-0.10955960843206995,-0.05905400025962592,-0.097092739690789923,-0.047815628974686934],"text":["oil (Energy)","prices (Energy)","crude (Energy)","opec (Energy)","saudi (Energy)","posted (Energy)","january (Energy)","barrel (Energy)","bpd (Energy)","barrels (Energy)","kuwait (Energy)","brings (Energy)","last (Energy)","price (Energy)","official (Energy)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,114,178,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(0,114,178,1)"}},"hoveron":"points","name":"Energy","legendgroup":"Energy","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.049471990550235677,-0.12938039843126439,-0.04054852888666817,-0.49755766520957062,-0.029560512980131078,-0.23995081133082855,-0.40137146185349359,-0.62856787483224308,-0.12214171077436271,-0.019085633888122941,-0.030038900048657918,-0.078196579784648168,-0.015783548788308083,-0.14240672348438954,-0.021505496545829967],"y":[0.17489942694276914,0.060093524586883142,0.1041949847841091,-0.084375547690265146,0.13320400170100985,-0.00012950003840865457,-0.040846894528412295,-0.18311056236070655,0.0097847356884903595,0.091889478898152022,0.061464943402219266,0.10292251974440762,0.04075723759281874,0.021965967367350753,0.049873493966069916],"text":["shares (Finance)","inc (Finance)","stock (Finance)","undisclosed (Finance)","common (Finance)","corp (Finance)","terms (Finance)","division (Finance)","american (Finance)","offer (Finance)","systems (Finance)","merger (Finance)","company (Finance)","purchase (Finance)","acquire (Finance)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(213,94,0,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(213,94,0,1)"}},"hoveron":"points","name":"Finance","legendgroup":"Finance","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.014353753633487835,0.0031696540302367932,0.18245817839139369,0.089094799129091809,0.083143851465848101,0.052368314320282293,0.027273491858881414,-0.007611952348244886,0.0092360271773049563,0.072496258220543938],"y":[0.11100612457286085,0.012538616626605862,-0.084348513417700846,0.29552983519442522,-0.071906345744260608,-0.042238035098445911,-0.0079807751232539113,0.002684444070366845,0.10176894169435273,-0.11250787752611267],"text":["mln (Shared)","dlrs (Shared)","west (Shared)","billion (Shared)","today (Shared)","market (Shared)","new (Shared)","will (Shared)","pct (Shared)","contract (Shared)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(123,45,142,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(123,45,142,1)"}},"hoveron":"points","name":"Shared","legendgroup":"Shared","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.049471990550235677,-0.12938039843126439,-0.04054852888666817,-0.029560512980131078,-0.23995081133082855,-0.12214171077436271,-0.019085633888122941,-0.030038900048657918,-0.078196579784648168,-0.015783548788308083,-0.14240672348438954,-0.021505496545829967],"y":[0.17489942694276914,0.060093524586883142,0.1041949847841091,0.13320400170100985,-0.00012950003840865457,0.0097847356884903595,0.091889478898152022,0.061464943402219266,0.10292251974440762,0.04075723759281874,0.021965967367350753,0.049873493966069916],"type":"scatter","mode":"text","text":["shares","inc","stock","common","corp","american","offer","systems","merger","company","purchase","acquire"],"textfont":{"size":11,"color":"#D55E00"},"name":"Finance","legendgroup":"Finance","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text"],"hovertext":["shares (Finance)","inc (Finance)","stock (Finance)","common (Finance)","corp (Finance)","american (Finance)","offer (Finance)","systems (Finance)","merger (Finance)","company (Finance)","purchase (Finance)","acquire (Finance)"],"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"line":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1.8732959001657634],"y":[1.9270860951021413],"type":"scatter","mode":"markers","marker":{"color":"#D55E00","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(255,127,14,1)"}},"name":"Finance","legendgroup":"Finance","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"line":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.28836086654331949,0.1657808941561569,0.24753346914792965,0.15005897875816435,0.098896621624986067,0.14711428780722954,0.10607858018124708,0.08051017201028017,0.086010212199783401,0.064786099141849379,0.10008936204826478,0.07478606229021495,0.10014917224483195,0.063693207340012561],"y":[-0.24832225144148024,-0.17896550697907548,-0.15490496801074929,-0.1413326394049457,-0.095827323182578472,-0.16973391719486938,-0.11247257380227436,-0.046945776019420869,0.19094653091790281,-0.057862022883080608,-0.10955960843206995,-0.05905400025962592,-0.097092739690789923,-0.047815628974686934],"type":"scatter","mode":"text","text":["oil","prices","crude","opec","saudi","posted","barrel","bpd","barrels","kuwait","brings","last","price","official"],"textfont":{"size":11,"color":"#0072B2"},"name":"Energy","legendgroup":"Energy","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"hovertext":["oil (Energy)","prices (Energy)","crude (Energy)","opec (Energy)","saudi (Energy)","posted (Energy)","barrel (Energy)","bpd (Energy)","barrels (Energy)","kuwait (Energy)","brings (Energy)","last (Energy)","price (Energy)","official (Energy)"],"marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"line":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1.8732959001657634],"y":[1.9270860951021413],"type":"scatter","mode":"markers","marker":{"color":"#0072B2","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(214,39,40,1)"}},"name":"Energy","legendgroup":"Energy","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"line":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[-0.014353753633487835,0.0031696540302367932,0.18245817839139369,0.089094799129091809,0.083143851465848101,0.052368314320282293,0.027273491858881414,-0.007611952348244886,0.0092360271773049563,0.072496258220543938],"y":[0.11100612457286085,0.012538616626605862,-0.084348513417700846,0.29552983519442522,-0.071906345744260608,-0.042238035098445911,-0.0079807751232539113,0.002684444070366845,0.10176894169435273,-0.11250787752611267],"type":"scatter","mode":"text","text":["mln","dlrs","west","billion","today","market","new","will","pct","contract"],"textfont":{"size":11,"color":"#7B2D8E"},"name":"Shared","legendgroup":"Shared","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text"],"hovertext":["mln (Shared)","dlrs (Shared)","west (Shared)","billion (Shared)","today (Shared)","market (Shared)","new (Shared)","will (Shared)","pct (Shared)","contract (Shared)"],"marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"line":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1.8732959001657634],"y":[1.9270860951021413],"type":"scatter","mode":"markers","marker":{"color":"#7B2D8E","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(140,86,75,1)"}},"name":"Shared","legendgroup":"Shared","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"line":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","frame":null}],"layout":{"margin":{"t":16,"r":0,"b":60,"l":45.429638854296407},"paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.27164951200327742,0.32005956721576839],"tickmode":"array","ticktext":["-0.50","-0.25","0.00","0.25"],"tickvals":[-0.5,-0.25,0,0.25],"categoryorder":"array","categoryarray":["-0.50","-0.25","0.00","0.25"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.9850560398505608,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":12.7521793275218},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0,"zeroline":false,"anchor":"y","title":{"text":"Principal Component 1","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"standoff":8},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.28095337663963454,0.32816096039257953],"tickmode":"array","ticktext":["-0.2","0.0","0.2","0.4","0.6"],"tickvals":[-0.20000000000000001,0,0.20000000000000001,0.40000000000000002,0.60000000000000009],"categoryorder":"array","categoryarray":["-0.2","0.0","0.2","0.4","0.6"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.9850560398505608,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":12.7521793275218},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0,"zeroline":false,"anchor":"x","title":{"text":"Principal Component 2","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"standoff":8},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","layer":"below","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":true,"legend":{"bgcolor":"#FAFAFA","bordercolor":"#CCCCCC","borderwidth":1,"font":{"color":"rgba(0,0,0,1)","family":"","size":12},"title":{"text":"","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243}},"orientation":"h","x":1,"xanchor":"right","y":-0.12,"tracegroupgap":4,"itemwidth":30,"itemsizing":"constant","xpad":4,"ypad":10},"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","modeBarButtonsToAdd":["hoverclosest","hovercompare"],"showSendToCloud":false},"source":"A","attrs":{"2d186eda39f9":{"x":{},"y":{},"colour":{},"text":{},"type":"scatter"},"2d186eda39f9.1":{"x":[-0.049471990550235677,-0.12938039843126439,-0.04054852888666817,-0.029560512980131078,-0.23995081133082855,-0.12214171077436271,-0.019085633888122941,-0.030038900048657918,-0.078196579784648168,-0.015783548788308083,-0.14240672348438954,-0.021505496545829967],"y":[0.17489942694276914,0.060093524586883142,0.1041949847841091,0.13320400170100985,-0.00012950003840865457,0.0097847356884903595,0.091889478898152022,0.061464943402219266,0.10292251974440762,0.04075723759281874,0.021965967367350753,0.049873493966069916],"type":"scatter","mode":"text","text":["shares","inc","stock","common","corp","american","offer","systems","merger","company","purchase","acquire"],"textfont":{"size":11,"color":"#D55E00"},"name":"Finance","legendgroup":"Finance","showlegend":false,"hoverinfo":"text","hovertext":["shares (Finance)","inc (Finance)","stock (Finance)","common (Finance)","corp (Finance)","american (Finance)","offer (Finance)","systems (Finance)","merger (Finance)","company (Finance)","purchase (Finance)","acquire (Finance)"],"inherit":false},"2d186eda39f9.2":{"x":1.8732959001657634,"y":1.9270860951021413,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#D55E00","opacity":1,"symbol":"circle"},"name":"Finance","legendgroup":"Finance","showlegend":true,"hoverinfo":"skip","inherit":false},"2d186eda39f9.3":{"x":[0.28836086654331949,0.1657808941561569,0.24753346914792965,0.15005897875816435,0.098896621624986067,0.14711428780722954,0.10607858018124708,0.08051017201028017,0.086010212199783401,0.064786099141849379,0.10008936204826478,0.07478606229021495,0.10014917224483195,0.063693207340012561],"y":[-0.24832225144148024,-0.17896550697907548,-0.15490496801074929,-0.1413326394049457,-0.095827323182578472,-0.16973391719486938,-0.11247257380227436,-0.046945776019420869,0.19094653091790281,-0.057862022883080608,-0.10955960843206995,-0.05905400025962592,-0.097092739690789923,-0.047815628974686934],"type":"scatter","mode":"text","text":["oil","prices","crude","opec","saudi","posted","barrel","bpd","barrels","kuwait","brings","last","price","official"],"textfont":{"size":11,"color":"#0072B2"},"name":"Energy","legendgroup":"Energy","showlegend":false,"hoverinfo":"text","hovertext":["oil (Energy)","prices (Energy)","crude (Energy)","opec (Energy)","saudi (Energy)","posted (Energy)","barrel (Energy)","bpd (Energy)","barrels (Energy)","kuwait (Energy)","brings (Energy)","last (Energy)","price (Energy)","official (Energy)"],"inherit":false},"2d186eda39f9.4":{"x":1.8732959001657634,"y":1.9270860951021413,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#0072B2","opacity":1,"symbol":"circle"},"name":"Energy","legendgroup":"Energy","showlegend":true,"hoverinfo":"skip","inherit":false},"2d186eda39f9.5":{"x":[-0.014353753633487835,0.0031696540302367932,0.18245817839139369,0.089094799129091809,0.083143851465848101,0.052368314320282293,0.027273491858881414,-0.007611952348244886,0.0092360271773049563,0.072496258220543938],"y":[0.11100612457286085,0.012538616626605862,-0.084348513417700846,0.29552983519442522,-0.071906345744260608,-0.042238035098445911,-0.0079807751232539113,0.002684444070366845,0.10176894169435273,-0.11250787752611267],"type":"scatter","mode":"text","text":["mln","dlrs","west","billion","today","market","new","will","pct","contract"],"textfont":{"size":11,"color":"#7B2D8E"},"name":"Shared","legendgroup":"Shared","showlegend":false,"hoverinfo":"text","hovertext":["mln (Shared)","dlrs (Shared)","west (Shared)","billion (Shared)","today (Shared)","market (Shared)","new (Shared)","will (Shared)","pct (Shared)","contract (Shared)"],"inherit":false},"2d186eda39f9.6":{"x":1.8732959001657634,"y":1.9270860951021413,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#7B2D8E","opacity":1,"symbol":"circle"},"name":"Shared","legendgroup":"Shared","showlegend":true,"hoverinfo":"skip","inherit":false}},"cur_data":"2d186eda39f9","visdat":{"2d186eda39f9":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.20000000000000001,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>
<p class="caption">
Figure 1: <em>Word Vectors from Reuters Newswire Articles (Finance vs Energy) Projected to Two Dimensions via PCA on a 20-Dimensional LSA Space.</em> Finance terms (vermillion) cluster in a distinct region from energy terms (blue); shared vocabulary occupies intermediate positions. Select an area of the plot to zoom in; double-click to reset.
</p>
</div>
<p>The cosine similarities confirm what Figure 1 shows geometrically. Within-domain pairs cluster tightly – <code>oil ~ barrel</code> and <code>price ~ barrel</code> have high positive cosines because these words habitually appear together in oil-market dispatches – while cross-domain pairs like <code>shares ~ oil</code> and <code>acquisition ~ barrel</code> sit near zero: they simply never keep each other’s company. Notice, too, that <code>price ~ shares</code> is far lower than <code>price ~ barrel</code>. The same word, ‘price’, lands in a different region of the space depending on the context in which it predominantly occurs. Firth’s principle, made numerical.</p>
</div>
<div id="state-of-the-union-pre-war-vs-post-war" class="section level3">
<h3>State of the Union: Pre-War vs Post-War</h3>
<p>From newswire to politics. The <code>sotu</code> package provides the full text of every US State of the Union address. Splitting at 1945 – the end of the Second World War – reveals how American political vocabulary has shifted over two centuries: from the constitutional and agrarian language of the early republic to the geopolitical and welfare-state vocabulary of the modern era. The loadings table and figure below present the results.</p>
<pre>if (!requireNamespace(&quot;sotu&quot;, quietly = TRUE)) install.packages(&quot;sotu&quot;)

sotu_texts  &lt;- sotu::sotu_text
sotu_years  &lt;- sotu::sotu_meta$year
sotu_labels &lt;- ifelse(sotu_years &lt; 1945, &quot;Pre-1945&quot;, &quot;Post-1945&quot;)

res2 &lt;- lsa_pipeline(as.list(sotu_texts), sotu_labels,
  grp_a = &quot;Pre-1945&quot;, grp_b = &quot;Post-1945&quot;,
  lab_a = &quot;Pre-1945&quot;, lab_b = &quot;Post-1945&quot;,
  colour_a = &quot;#E69F00&quot;, colour_b = &quot;#009E73&quot;,
  min_docs = 5)</pre>
<table>
<caption><span id="tab:loadings-sotu">Table 2: </span><em>Mean PCA Loadings on the First Two Components (Highest Positive Loading per Group in Bold, Excluding Shared)</em></caption>
<thead>
<tr class="header">
<th align="left">Group</th>
<th align="right">PC1</th>
<th align="right">PC2</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Pre-1945</td>
<td align="right">-.845</td>
<td align="right">-.499</td>
</tr>
<tr class="even">
<td align="left">Post-1945</td>
<td align="right">-.492</td>
<td align="right"><strong>.385</strong></td>
</tr>
<tr class="odd">
<td align="left">Shared</td>
<td align="right">-.379</td>
<td align="right">.132</td>
</tr>
</tbody>
</table>
<div class="figure" style="text-align: center"><span style="display:block;" id="fig:plot-sotu"></span>
<div class="plotly html-widget html-fill-item" id="htmlwidget-2" style="width:768px;height:576px;"></div>
<script type="application/json" data-for="htmlwidget-2">{"x":{"data":[{"x":[0.013287752989669022,0.0015151078484123299,-0.0041639927809794999,0.0024329863389495254,-0.0064838415639756064,-0.00029656493302945364,-0.0050786664995796796,-0.0067667033813527122,-0.004428015965645557,-0.0038366409369963799,-0.0039974776035780419,-0.0058161097063427021,-0.0026112723380078925,-0.0024394203616021188,-0.0046482674401974854],"y":[0.041695332466203217,0.024383169693297253,0.025875377447865128,0.019007361389868749,0.0073319475628275572,0.032911663123940386,0.013545482031131987,0.0041182589008741683,0.010159611319146701,0.0057197970782016262,0.0085645354755251781,0.0029768676043938286,0.016927014200524924,0.0045156386563725724,0.0015016816366281121],"text":["tonight (Post-1945)","jobs (Post-1945)","weve (Post-1945)","americans (Post-1945)","programs (Post-1945)","thats (Post-1945)","help (Post-1945)","program (Post-1945)","budget (Post-1945)","billion (Post-1945)","percent (Post-1945)","soviet (Post-1945)","lets (Post-1945)","nuclear (Post-1945)","economic (Post-1945)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,158,115,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(0,158,115,1)"}},"hoveron":"points","name":"Post-1945","legendgroup":"Post-1945","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.0059299883753514354,-0.0060110853326906088,-0.0059168280717346849,-0.0059782667234740223,-0.0056107677277599062,-0.0058874592129681599,-0.0055614503444968372,-0.0058008983306079035,-0.0054766275528193324,-0.0055163152625125672,-0.0057213661109132528,-0.0058775943513935504,-0.0054604074907543161,-0.0056906537251830026,-0.0053566471442892177],"y":[-0.016154997431639274,-0.01808959551024486,-0.01698903460044646,-0.016988323911570406,-0.015778337357642263,-0.016229671682749129,-0.015776979601598757,-0.016145812067941891,-0.019396001512634261,-0.014690600254520521,-0.015794162858600122,-0.016220683825330474,-0.01650308897215079,-0.015510408718959881,-0.015930201436391997],"text":["upon (Pre-1945)","spain (Pre-1945)","vessels (Pre-1945)","british (Pre-1945)","mexico (Pre-1945)","subject (Pre-1945)","treasury (Pre-1945)","commerce (Pre-1945)","gentlemen (Pre-1945)","cent (Pre-1945)","duties (Pre-1945)","militia (Pre-1945)","treaty (Pre-1945)","consideration (Pre-1945)","territory (Pre-1945)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(230,159,0,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(230,159,0,1)"}},"hoveron":"points","name":"Pre-1945","legendgroup":"Pre-1945","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.0017478123280863518,0.00068511788801251926,0.022229517128775089,-0.0039339280350563048,-0.004657205911149263,-0.0032278109318195618,-0.0034786200235382152,-0.0014146489141031284,0.11738841084164225,-0.0027155376234720098],"y":[0.0067845141778160936,0.0070376565360968212,0.0066652305444180431,0.0024190192649280085,0.0050877343084827949,0.0010699557114107658,0.00085342072833027415,0.0080398796414144023,-0.014350851383924532,0.0033576016266469869],"text":["america (Shared)","americas (Shared)","thank (Shared)","tax (Shared)","spending (Shared)","million (Shared)","today (Shared)","get (Shared)","applause (Shared)","workers (Shared)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(123,45,142,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(123,45,142,1)"}},"hoveron":"points","name":"Shared","legendgroup":"Shared","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.0059299883753514354,-0.0060110853326906088,-0.0059168280717346849,-0.0059782667234740223,-0.0056107677277599062,-0.0058874592129681599,-0.0055614503444968372,-0.0058008983306079035,-0.0054766275528193324,-0.0055163152625125672,-0.0057213661109132528,-0.0058775943513935504,-0.0054604074907543161,-0.0056906537251830026,-0.0053566471442892177],"y":[-0.016154997431639274,-0.01808959551024486,-0.01698903460044646,-0.016988323911570406,-0.015778337357642263,-0.016229671682749129,-0.015776979601598757,-0.016145812067941891,-0.019396001512634261,-0.014690600254520521,-0.015794162858600122,-0.016220683825330474,-0.01650308897215079,-0.015510408718959881,-0.015930201436391997],"type":"scatter","mode":"text","text":["upon","spain","vessels","british","mexico","subject","treasury","commerce","gentlemen","cent","duties","militia","treaty","consideration","territory"],"textfont":{"size":11,"color":"#E69F00"},"name":"Pre-1945","legendgroup":"Pre-1945","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"hovertext":["upon (Pre-1945)","spain (Pre-1945)","vessels (Pre-1945)","british (Pre-1945)","mexico (Pre-1945)","subject (Pre-1945)","treasury (Pre-1945)","commerce (Pre-1945)","gentlemen (Pre-1945)","cent (Pre-1945)","duties (Pre-1945)","militia (Pre-1945)","treaty (Pre-1945)","consideration (Pre-1945)","territory (Pre-1945)"],"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"line":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.030032055499856242],"y":[0.18983465703366431],"type":"scatter","mode":"markers","marker":{"color":"#E69F00","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(255,127,14,1)"}},"name":"Pre-1945","legendgroup":"Pre-1945","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"line":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.0015151078484123299,-0.0041639927809794999,0.0024329863389495254,-0.0064838415639756064,-0.00029656493302945364,-0.0050786664995796796,-0.0067667033813527122,-0.004428015965645557,-0.0038366409369963799,-0.0039974776035780419,-0.0058161097063427021,-0.0026112723380078925,-0.0024394203616021188,-0.0046482674401974854],"y":[0.024383169693297253,0.025875377447865128,0.019007361389868749,0.0073319475628275572,0.032911663123940386,0.013545482031131987,0.0041182589008741683,0.010159611319146701,0.0057197970782016262,0.0085645354755251781,0.0029768676043938286,0.016927014200524924,0.0045156386563725724,0.0015016816366281121],"type":"scatter","mode":"text","text":["jobs","weve","americans","programs","thats","help","program","budget","billion","percent","soviet","lets","nuclear","economic"],"textfont":{"size":11,"color":"#009E73"},"name":"Post-1945","legendgroup":"Post-1945","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"hovertext":["jobs (Post-1945)","weve (Post-1945)","americans (Post-1945)","programs (Post-1945)","thats (Post-1945)","help (Post-1945)","program (Post-1945)","budget (Post-1945)","billion (Post-1945)","percent (Post-1945)","soviet (Post-1945)","lets (Post-1945)","nuclear (Post-1945)","economic (Post-1945)"],"marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"line":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.030032055499856242],"y":[0.18983465703366431],"type":"scatter","mode":"markers","marker":{"color":"#009E73","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(214,39,40,1)"}},"name":"Post-1945","legendgroup":"Post-1945","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"line":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[-0.0017478123280863518,0.00068511788801251926,-0.0039339280350563048,-0.004657205911149263,-0.0032278109318195618,-0.0034786200235382152,-0.0014146489141031284,-0.0027155376234720098],"y":[0.0067845141778160936,0.0070376565360968212,0.0024190192649280085,0.0050877343084827949,0.0010699557114107658,0.00085342072833027415,0.0080398796414144023,0.0033576016266469869],"type":"scatter","mode":"text","text":["america","americas","tax","spending","million","today","get","workers"],"textfont":{"size":11,"color":"#7B2D8E"},"name":"Shared","legendgroup":"Shared","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text"],"hovertext":["america (Shared)","americas (Shared)","tax (Shared)","spending (Shared)","million (Shared)","today (Shared)","get (Shared)","workers (Shared)"],"marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"line":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.030032055499856242],"y":[0.18983465703366431],"type":"scatter","mode":"markers","marker":{"color":"#7B2D8E","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(140,86,75,1)"}},"name":"Shared","legendgroup":"Shared","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"line":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","frame":null}],"layout":{"margin":{"t":16,"r":0,"b":60,"l":51.8057285180573},"paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.0073186847645708465,0.0029849677221676597],"tickmode":"array","ticktext":["0.00","0.04","0.08","0.12"],"tickvals":[0,0.040000000000000008,0.080000000000000002,0.12000000000000001],"categoryorder":"array","categoryarray":["0.00","0.04","0.08","0.12"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.9850560398505608,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":12.7521793275218},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0,"zeroline":false,"anchor":"y","title":{"text":"Principal Component 1","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"standoff":8},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.02253446139082874,0.036050123002134865],"tickmode":"array","ticktext":["-0.02","0.00","0.02","0.04"],"tickvals":[-0.02,0,0.019999999999999997,0.039999999999999987],"categoryorder":"array","categoryarray":["-0.02","0.00","0.02","0.04"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.9850560398505608,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":12.7521793275218},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0,"zeroline":false,"anchor":"x","title":{"text":"Principal Component 2","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"standoff":8},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","layer":"below","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":true,"legend":{"bgcolor":"#FAFAFA","bordercolor":"#CCCCCC","borderwidth":1,"font":{"color":"rgba(0,0,0,1)","family":"","size":12},"title":{"text":"","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243}},"orientation":"h","x":1,"xanchor":"right","y":-0.12,"tracegroupgap":4,"itemwidth":30,"itemsizing":"constant","xpad":4,"ypad":10},"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","modeBarButtonsToAdd":["hoverclosest","hovercompare"],"showSendToCloud":false},"source":"A","attrs":{"2d181f60119":{"x":{},"y":{},"colour":{},"text":{},"type":"scatter"},"2d181f60119.1":{"x":[-0.0059299883753514354,-0.0060110853326906088,-0.0059168280717346849,-0.0059782667234740223,-0.0056107677277599062,-0.0058874592129681599,-0.0055614503444968372,-0.0058008983306079035,-0.0054766275528193324,-0.0055163152625125672,-0.0057213661109132528,-0.0058775943513935504,-0.0054604074907543161,-0.0056906537251830026,-0.0053566471442892177],"y":[-0.016154997431639274,-0.01808959551024486,-0.01698903460044646,-0.016988323911570406,-0.015778337357642263,-0.016229671682749129,-0.015776979601598757,-0.016145812067941891,-0.019396001512634261,-0.014690600254520521,-0.015794162858600122,-0.016220683825330474,-0.01650308897215079,-0.015510408718959881,-0.015930201436391997],"type":"scatter","mode":"text","text":["upon","spain","vessels","british","mexico","subject","treasury","commerce","gentlemen","cent","duties","militia","treaty","consideration","territory"],"textfont":{"size":11,"color":"#E69F00"},"name":"Pre-1945","legendgroup":"Pre-1945","showlegend":false,"hoverinfo":"text","hovertext":["upon (Pre-1945)","spain (Pre-1945)","vessels (Pre-1945)","british (Pre-1945)","mexico (Pre-1945)","subject (Pre-1945)","treasury (Pre-1945)","commerce (Pre-1945)","gentlemen (Pre-1945)","cent (Pre-1945)","duties (Pre-1945)","militia (Pre-1945)","treaty (Pre-1945)","consideration (Pre-1945)","territory (Pre-1945)"],"inherit":false},"2d181f60119.2":{"x":0.030032055499856242,"y":0.18983465703366431,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#E69F00","opacity":1,"symbol":"circle"},"name":"Pre-1945","legendgroup":"Pre-1945","showlegend":true,"hoverinfo":"skip","inherit":false},"2d181f60119.3":{"x":[0.0015151078484123299,-0.0041639927809794999,0.0024329863389495254,-0.0064838415639756064,-0.00029656493302945364,-0.0050786664995796796,-0.0067667033813527122,-0.004428015965645557,-0.0038366409369963799,-0.0039974776035780419,-0.0058161097063427021,-0.0026112723380078925,-0.0024394203616021188,-0.0046482674401974854],"y":[0.024383169693297253,0.025875377447865128,0.019007361389868749,0.0073319475628275572,0.032911663123940386,0.013545482031131987,0.0041182589008741683,0.010159611319146701,0.0057197970782016262,0.0085645354755251781,0.0029768676043938286,0.016927014200524924,0.0045156386563725724,0.0015016816366281121],"type":"scatter","mode":"text","text":["jobs","weve","americans","programs","thats","help","program","budget","billion","percent","soviet","lets","nuclear","economic"],"textfont":{"size":11,"color":"#009E73"},"name":"Post-1945","legendgroup":"Post-1945","showlegend":false,"hoverinfo":"text","hovertext":["jobs (Post-1945)","weve (Post-1945)","americans (Post-1945)","programs (Post-1945)","thats (Post-1945)","help (Post-1945)","program (Post-1945)","budget (Post-1945)","billion (Post-1945)","percent (Post-1945)","soviet (Post-1945)","lets (Post-1945)","nuclear (Post-1945)","economic (Post-1945)"],"inherit":false},"2d181f60119.4":{"x":0.030032055499856242,"y":0.18983465703366431,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#009E73","opacity":1,"symbol":"circle"},"name":"Post-1945","legendgroup":"Post-1945","showlegend":true,"hoverinfo":"skip","inherit":false},"2d181f60119.5":{"x":[-0.0017478123280863518,0.00068511788801251926,-0.0039339280350563048,-0.004657205911149263,-0.0032278109318195618,-0.0034786200235382152,-0.0014146489141031284,-0.0027155376234720098],"y":[0.0067845141778160936,0.0070376565360968212,0.0024190192649280085,0.0050877343084827949,0.0010699557114107658,0.00085342072833027415,0.0080398796414144023,0.0033576016266469869],"type":"scatter","mode":"text","text":["america","americas","tax","spending","million","today","get","workers"],"textfont":{"size":11,"color":"#7B2D8E"},"name":"Shared","legendgroup":"Shared","showlegend":false,"hoverinfo":"text","hovertext":["america (Shared)","americas (Shared)","tax (Shared)","spending (Shared)","million (Shared)","today (Shared)","get (Shared)","workers (Shared)"],"inherit":false},"2d181f60119.6":{"x":0.030032055499856242,"y":0.18983465703366431,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#7B2D8E","opacity":1,"symbol":"circle"},"name":"Shared","legendgroup":"Shared","showlegend":true,"hoverinfo":"skip","inherit":false}},"cur_data":"2d181f60119","visdat":{"2d181f60119":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.20000000000000001,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>
<p class="caption">
Figure 2: <em>Word Vectors from US State of the Union Addresses Projected to Two Dimensions, Split at 1945.</em> Pre-war speeches (amber) feature constitutional and agrarian vocabulary; post-war speeches (green) shift to geopolitical and welfare-state terms. Select an area of the plot to zoom in; double-click to reset.
</p>
</div>
<p>The separation is striking. Table 2 reveals that both groups have negative mean loadings on PC1, so the first component does not cleanly separate them – it mainly captures variance shared across eras (general political vocabulary that appears throughout the full 200-year span). The real separation lives on PC2: pre-war words load negatively while post-war words load positively, confirming that the vertical axis in Figure 2 is the one that distinguishes the two eras. Pre-war presidents address ‘gentlemen’ (the formal salutation of a different era) and discuss ‘vessels’, ‘militia’, ‘commerce’ and ‘treasury’ – the vocabulary of a young republic preoccupied with trade, territorial expansion and the mechanics of governance. Modern presidents speak of ‘tonight’ (State of the Union addresses have been televised since the 1960s), ‘jobs’, ‘nuclear’ and ‘program’ – the vocabulary of a superpower managing a welfare state and a global military presence. Words like ‘congress’, ‘government’, ‘war’ and ‘people’ anchor both eras, sitting in the shared middle ground.</p>
</div>
<div id="imdb-film-reviews-positive-vs-negative" class="section level3">
<h3>IMDB Film Reviews: Positive vs Negative</h3>
<p>Now a harder test. The <code>text2vec</code> package includes 5,000 IMDB film reviews labelled as positive or negative – a classic sentiment-analysis benchmark. Unlike the two corpora above, the split here is not by topic but by evaluative tone. Both positive and negative reviews discuss films, characters, plots and acting; the difference lies in the adjectives and evaluative phrasing. This makes the separation task far harder for a simple co-occurrence model – and the result is instructive. The loadings table and figure below present the results.</p>
<pre>if (!requireNamespace(&quot;text2vec&quot;, quietly = TRUE)) install.packages(&quot;text2vec&quot;)

data(&quot;movie_review&quot;, package = &quot;text2vec&quot;)
mv_labels &lt;- ifelse(movie_review$sentiment == 1, &quot;Positive&quot;, &quot;Negative&quot;)

res3 &lt;- lsa_pipeline(as.list(movie_review$review), mv_labels,
  grp_a = &quot;Positive&quot;, grp_b = &quot;Negative&quot;,
  lab_a = &quot;Positive&quot;, lab_b = &quot;Negative&quot;,
  colour_a = &quot;#009E73&quot;, colour_b = &quot;#D55E00&quot;,
  min_docs = 50)</pre>
<table>
<caption><span id="tab:loadings-imdb">Table 3: </span><em>Mean PCA Loadings on the First Two Components (Highest Positive Loading per Group in Bold, Excluding Shared)</em></caption>
<thead>
<tr class="header">
<th align="left">Group</th>
<th align="right">PC1</th>
<th align="right">PC2</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">Negative</td>
<td align="right"><strong>.329</strong></td>
<td align="right">-.255</td>
</tr>
<tr class="even">
<td align="left">Positive</td>
<td align="right">-.081</td>
<td align="right"><strong>.192</strong></td>
</tr>
<tr class="odd">
<td align="left">Shared</td>
<td align="right">.103</td>
<td align="right">.281</td>
</tr>
</tbody>
</table>
<div class="figure" style="text-align: center"><span style="display:block;" id="fig:plot-imdb"></span>
<div class="plotly html-widget html-fill-item" id="htmlwidget-3" style="width:768px;height:576px;"></div>
<script type="application/json" data-for="htmlwidget-3">{"x":{"data":[{"x":[0.18571654615554042,0.099809048091466776,0.54682437946265205,0.36439482278066504,0.1271384093140395,0.010783944221508043,-0.0063987379304191756,0.17527218291872354,0.11769089497754716,0.011732048948378102,0.40392818628601812,0.028398043898213224,0.15777405945755296,0.10697917310366636,0.074886297073802885],"y":[-0.23334987004789814,-0.17579882921465306,-1.1752727275347559,-0.98471194409893514,-0.17245095481958869,-0.10061200644101291,-0.13837537217875012,-0.15878048299828953,-0.14217651842001292,-0.010123057993395339,-0.74957240804973946,-0.16231526431483859,-0.26943787686463971,-0.027475471610646077,-0.18593059282974145],"text":["bad (Negative)","worst (Negative)","waste (Negative)","awful (Negative)","poor (Negative)","nothing (Negative)","stupid (Negative)","terrible (Negative)","worse (Negative)","even (Negative)","script (Negative)","boring (Negative)","money (Negative)","plot (Negative)","horrible (Negative)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(213,94,0,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(213,94,0,1)"}},"hoveron":"points","name":"Negative","legendgroup":"Negative","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.081744018423949641,-0.047571536237173584,-0.10790862327796238,-0.0090255765592872038,-0.080712604885222236,-0.12532619741627082,-0.029145246140461999,0.073327862399543531,0.067948240498652099,-0.057808595192257631,0.010966278713061849,-0.083590897681595416,0.010950317928239492,-0.0048006452696497818,-0.028322253846350821],"y":[0.46046284257107145,0.25777241131156203,0.22260967753319263,0.17887405184315525,0.28082563733867594,0.26691602855778812,0.18979696029567342,0.12467808537344016,0.086506620780249399,0.11553624444403925,0.067215601055555421,0.21969682869980814,0.10874128923843671,0.09570158770482054,0.15495837890857736],"text":["great (Positive)","excellent (Positive)","best (Positive)","wonderful (Positive)","love (Positive)","family (Positive)","loved (Positive)","beautiful (Positive)","highly (Positive)","fun (Positive)","perfect (Positive)","also (Positive)","young (Positive)","favorite (Positive)","life (Positive)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(0,158,115,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(0,158,115,1)"}},"hoveron":"points","name":"Positive","legendgroup":"Positive","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[0.2432130402373899,0.16955934027247727,-0.01056565525436497,-0.015944273052570485,-0.034182546658513897,-0.06192679310864109,0.066431080163338083,-0.018622098253129622,0.037747591349934988,-2.2878754890645903],"y":[0.50445274476249602,0.39116896728561923,0.29371423476444708,0.182290602545849,0.16064197365573799,0.11595431615189583,0.37509156551691564,0.2460104522233707,0.21928355658824175,-0.6325172817337219],"text":["movie (Shared)","film (Shared)","good (Shared)","one (Shared)","like (Shared)","just (Shared)","story (Shared)","really (Shared)","see (Shared)","show (Shared)"],"type":"scatter","mode":"markers","marker":{"autocolorscale":false,"color":"rgba(123,45,142,1)","opacity":0,"size":0.10000000000000001,"symbol":"circle","line":{"width":1.8897637795275593,"color":"rgba(123,45,142,1)"}},"hoveron":"points","name":"Shared","legendgroup":"Shared","showlegend":false,"xaxis":"x","yaxis":"y","hoverinfo":"text","frame":null},{"x":[-0.081744018423949641,-0.047571536237173584,-0.10790862327796238,-0.0090255765592872038,-0.080712604885222236,-0.12532619741627082,-0.029145246140461999,0.073327862399543531,0.067948240498652099,-0.057808595192257631,0.010966278713061849,-0.083590897681595416,0.010950317928239492,-0.0048006452696497818,-0.028322253846350821],"y":[0.46046284257107145,0.25777241131156203,0.22260967753319263,0.17887405184315525,0.28082563733867594,0.26691602855778812,0.18979696029567342,0.12467808537344016,0.086506620780249399,0.11553624444403925,0.067215601055555421,0.21969682869980814,0.10874128923843671,0.09570158770482054,0.15495837890857736],"type":"scatter","mode":"text","text":["great","excellent","best","wonderful","love","family","loved","beautiful","highly","fun","perfect","also","young","favorite","life"],"textfont":{"size":11,"color":"#009E73"},"name":"Positive","legendgroup":"Positive","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"hovertext":["great (Positive)","excellent (Positive)","best (Positive)","wonderful (Positive)","love (Positive)","family (Positive)","loved (Positive)","beautiful (Positive)","highly (Positive)","fun (Positive)","perfect (Positive)","also (Positive)","young (Positive)","favorite (Positive)","life (Positive)"],"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"line":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1.9916913373928846],"y":[4.9719468113467888],"type":"scatter","mode":"markers","marker":{"color":"#009E73","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(255,127,14,1)"}},"name":"Positive","legendgroup":"Positive","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"line":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.18571654615554042,0.099809048091466776,0.36439482278066504,0.1271384093140395,0.010783944221508043,-0.0063987379304191756,0.17527218291872354,0.11769089497754716,0.011732048948378102,0.40392818628601812,0.028398043898213224,0.15777405945755296,0.10697917310366636,0.074886297073802885],"y":[-0.23334987004789814,-0.17579882921465306,-0.98471194409893514,-0.17245095481958869,-0.10061200644101291,-0.13837537217875012,-0.15878048299828953,-0.14217651842001292,-0.010123057993395339,-0.74957240804973946,-0.16231526431483859,-0.26943787686463971,-0.027475471610646077,-0.18593059282974145],"type":"scatter","mode":"text","text":["bad","worst","awful","poor","nothing","stupid","terrible","worse","even","script","boring","money","plot","horrible"],"textfont":{"size":11,"color":"#D55E00"},"name":"Negative","legendgroup":"Negative","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text","text","text","text","text","text"],"hovertext":["bad (Negative)","worst (Negative)","awful (Negative)","poor (Negative)","nothing (Negative)","stupid (Negative)","terrible (Negative)","worse (Negative)","even (Negative)","script (Negative)","boring (Negative)","money (Negative)","plot (Negative)","horrible (Negative)"],"marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"line":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1.9916913373928846],"y":[4.9719468113467888],"type":"scatter","mode":"markers","marker":{"color":"#D55E00","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(214,39,40,1)"}},"name":"Negative","legendgroup":"Negative","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"line":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0.2432130402373899,0.16955934027247727,-0.01056565525436497,-0.015944273052570485,-0.034182546658513897,-0.06192679310864109,0.066431080163338083,-0.018622098253129622,0.037747591349934988],"y":[0.50445274476249602,0.39116896728561923,0.29371423476444708,0.182290602545849,0.16064197365573799,0.11595431615189583,0.37509156551691564,0.2460104522233707,0.21928355658824175],"type":"scatter","mode":"text","text":["movie","film","good","one","like","just","story","really","see"],"textfont":{"size":11,"color":"#7B2D8E"},"name":"Shared","legendgroup":"Shared","showlegend":false,"hoverinfo":["text","text","text","text","text","text","text","text","text"],"hovertext":["movie (Shared)","film (Shared)","good (Shared)","one (Shared)","like (Shared)","just (Shared)","story (Shared)","really (Shared)","see (Shared)"],"marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"line":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1.9916913373928846],"y":[4.9719468113467888],"type":"scatter","mode":"markers","marker":{"color":"#7B2D8E","size":12,"opacity":1,"symbol":"circle","line":{"color":"rgba(140,86,75,1)"}},"name":"Shared","legendgroup":"Shared","showlegend":true,"hoverinfo":"skip","error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"line":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","frame":null}],"layout":{"margin":{"t":16,"r":0,"b":60,"l":45.429638854296407},"paper_bgcolor":"rgba(255,255,255,1)","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"xaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-0.15708146043840815,0.43568344930815545],"tickmode":"array","ticktext":["-2","-1","0"],"tickvals":[-2,-1,0],"categoryorder":"array","categoryarray":["-2","-1","0"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.9850560398505608,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":12.7521793275218},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0,"zeroline":false,"anchor":"y","title":{"text":"Principal Component 1","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"standoff":8},"hoverformat":".2f"},"yaxis":{"domain":[0,1],"automargin":true,"type":"linear","autorange":false,"range":[-1.0740618254306211,0.59380262609418188],"tickmode":"array","ticktext":["-1.0","-0.5","0.0","0.5"],"tickvals":[-1,-0.5,0,0.5],"categoryorder":"array","categoryarray":["-1.0","-0.5","0.0","0.5"],"nticks":null,"ticks":"","tickcolor":null,"ticklen":3.9850560398505608,"tickwidth":0,"showticklabels":true,"tickfont":{"color":"rgba(77,77,77,1)","family":"","size":12.7521793275218},"tickangle":-0,"showline":false,"linecolor":null,"linewidth":0,"showgrid":true,"gridcolor":"rgba(235,235,235,1)","gridwidth":0,"zeroline":false,"anchor":"x","title":{"text":"Principal Component 2","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243},"standoff":8},"hoverformat":".2f"},"shapes":[{"type":"rect","fillcolor":null,"line":{"color":null,"width":0,"linetype":[]},"yref":"paper","xref":"paper","layer":"below","x0":0,"x1":1,"y0":0,"y1":1}],"showlegend":true,"legend":{"bgcolor":"#FAFAFA","bordercolor":"#CCCCCC","borderwidth":1,"font":{"color":"rgba(0,0,0,1)","family":"","size":12},"title":{"text":"","font":{"color":"rgba(0,0,0,1)","family":"","size":15.940224159402243}},"orientation":"h","x":1,"xanchor":"right","y":-0.12,"tracegroupgap":4,"itemwidth":30,"itemsizing":"constant","xpad":4,"ypad":10},"hovermode":"closest","barmode":"relative"},"config":{"doubleClick":"reset","modeBarButtonsToAdd":["hoverclosest","hovercompare"],"showSendToCloud":false},"source":"A","attrs":{"2d184bbd4f35":{"x":{},"y":{},"colour":{},"text":{},"type":"scatter"},"2d184bbd4f35.1":{"x":[-0.081744018423949641,-0.047571536237173584,-0.10790862327796238,-0.0090255765592872038,-0.080712604885222236,-0.12532619741627082,-0.029145246140461999,0.073327862399543531,0.067948240498652099,-0.057808595192257631,0.010966278713061849,-0.083590897681595416,0.010950317928239492,-0.0048006452696497818,-0.028322253846350821],"y":[0.46046284257107145,0.25777241131156203,0.22260967753319263,0.17887405184315525,0.28082563733867594,0.26691602855778812,0.18979696029567342,0.12467808537344016,0.086506620780249399,0.11553624444403925,0.067215601055555421,0.21969682869980814,0.10874128923843671,0.09570158770482054,0.15495837890857736],"type":"scatter","mode":"text","text":["great","excellent","best","wonderful","love","family","loved","beautiful","highly","fun","perfect","also","young","favorite","life"],"textfont":{"size":11,"color":"#009E73"},"name":"Positive","legendgroup":"Positive","showlegend":false,"hoverinfo":"text","hovertext":["great (Positive)","excellent (Positive)","best (Positive)","wonderful (Positive)","love (Positive)","family (Positive)","loved (Positive)","beautiful (Positive)","highly (Positive)","fun (Positive)","perfect (Positive)","also (Positive)","young (Positive)","favorite (Positive)","life (Positive)"],"inherit":false},"2d184bbd4f35.2":{"x":1.9916913373928846,"y":4.9719468113467888,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#009E73","opacity":1,"symbol":"circle"},"name":"Positive","legendgroup":"Positive","showlegend":true,"hoverinfo":"skip","inherit":false},"2d184bbd4f35.3":{"x":[0.18571654615554042,0.099809048091466776,0.36439482278066504,0.1271384093140395,0.010783944221508043,-0.0063987379304191756,0.17527218291872354,0.11769089497754716,0.011732048948378102,0.40392818628601812,0.028398043898213224,0.15777405945755296,0.10697917310366636,0.074886297073802885],"y":[-0.23334987004789814,-0.17579882921465306,-0.98471194409893514,-0.17245095481958869,-0.10061200644101291,-0.13837537217875012,-0.15878048299828953,-0.14217651842001292,-0.010123057993395339,-0.74957240804973946,-0.16231526431483859,-0.26943787686463971,-0.027475471610646077,-0.18593059282974145],"type":"scatter","mode":"text","text":["bad","worst","awful","poor","nothing","stupid","terrible","worse","even","script","boring","money","plot","horrible"],"textfont":{"size":11,"color":"#D55E00"},"name":"Negative","legendgroup":"Negative","showlegend":false,"hoverinfo":"text","hovertext":["bad (Negative)","worst (Negative)","awful (Negative)","poor (Negative)","nothing (Negative)","stupid (Negative)","terrible (Negative)","worse (Negative)","even (Negative)","script (Negative)","boring (Negative)","money (Negative)","plot (Negative)","horrible (Negative)"],"inherit":false},"2d184bbd4f35.4":{"x":1.9916913373928846,"y":4.9719468113467888,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#D55E00","opacity":1,"symbol":"circle"},"name":"Negative","legendgroup":"Negative","showlegend":true,"hoverinfo":"skip","inherit":false},"2d184bbd4f35.5":{"x":[0.2432130402373899,0.16955934027247727,-0.01056565525436497,-0.015944273052570485,-0.034182546658513897,-0.06192679310864109,0.066431080163338083,-0.018622098253129622,0.037747591349934988],"y":[0.50445274476249602,0.39116896728561923,0.29371423476444708,0.182290602545849,0.16064197365573799,0.11595431615189583,0.37509156551691564,0.2460104522233707,0.21928355658824175],"type":"scatter","mode":"text","text":["movie","film","good","one","like","just","story","really","see"],"textfont":{"size":11,"color":"#7B2D8E"},"name":"Shared","legendgroup":"Shared","showlegend":false,"hoverinfo":"text","hovertext":["movie (Shared)","film (Shared)","good (Shared)","one (Shared)","like (Shared)","just (Shared)","story (Shared)","really (Shared)","see (Shared)"],"inherit":false},"2d184bbd4f35.6":{"x":1.9916913373928846,"y":4.9719468113467888,"type":"scatter","mode":"markers","marker":{"size":12,"color":"#7B2D8E","opacity":1,"symbol":"circle"},"name":"Shared","legendgroup":"Shared","showlegend":true,"hoverinfo":"skip","inherit":false}},"cur_data":"2d184bbd4f35","visdat":{"2d184bbd4f35":["function (y) ","x"]},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.20000000000000001,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}</script>
<p class="caption">
Figure 3: <em>Word Vectors from 5,000 IMDB Film Reviews (Positive vs Negative) Projected to Two Dimensions.</em> Unlike the clean topic-based separations in the Reuters and SOTU corpora, the sentiment-based distinction is much muddier: positive and negative reviews share most of their vocabulary, and evaluative words overlap heavily. Select an area of the plot to zoom in; double-click to reset.
</p>
</div>
</div>
<div id="what-the-plots-capture-and-what-they-miss" class="section level3">
<h3>What the Plots Capture – and What They Miss</h3>
<p>Taken together, Figures 1–3 illustrate both the power and the limits of distributional models. Figure 1 captures the real-world distinction between financial and energy markets with striking clarity: domain-specific vocabulary clusters tightly, and a polysemous word like ‘price’ lands in different positions depending on its dominant context – precisely the kind of structure that Louwerse and colleagues have documented at larger scale. Figure 2 captures genuine historical change: pre-war addresses use the vocabulary of a young republic (‘gentlemen’, ‘militia’, ‘vessels’); modern ones use the vocabulary of a superpower (‘tonight’, ‘jobs’, ‘nuclear’), reflecting two centuries of political evolution.</p>
<p>Figure 3, however, reveals a clear limitation. Because both positive and negative reviews discuss the same subject – films – the topical vocabulary is largely shared, and the evaluative words that do separate them (‘excellent’ vs ‘worst’, for instance) form only a thin layer atop a large common vocabulary. A 20-dimensional LSA space simply lacks the resolution to untangle sentiment from topic. The model captures <em>what</em> people write about more easily than <em>how</em> they feel about it.</p>
<p>These imprecisions are not accidental; they reflect a fundamental constraint: model capacity.</p>
</div>
<div id="from-toy-models-to-titans" class="section level2">
<h2>From Toy Models to Titans</h2>
<p>The LSA spaces above used just 20 latent dimensions, trained on corpora of a few dozen to a few thousand documents. The vocabulary that survives the minimum-frequency filter numbers in the low thousands. Under these conditions, the model does a remarkable job of sorting finance from energy or 19th-century language from modern – but it lacks the capacity to encode the subtler distributional cues that distinguish evaluative tone, sarcasm or register.</p>
<p>The history of distributional models is, in large part, a history of scale. As <a href="https://doi.org/10.1177/09637214241242746" rel="nofollow" target="_blank">Connell and Lynott (2024)</a> illustrate, growth in model size over the past three decades has been staggering. The LSA models of the late 1990s (<a href="https://doi.org/10.1037/0033-295X.104.2.211" rel="nofollow" target="_blank">Landauer & Dumais, 1997</a>) had a few hundred latent dimensions and were trained on roughly 30,000 documents – already enough to pass synonym tests at near-human levels. Word2Vec (<a href="https://doi.org/10.48550/arXiv.1301.3781" rel="nofollow" target="_blank">Mikolov et al., 2013</a>) moved to shallow neural networks with a few million learnable parameters trained on billions of words. Then came the Transformer-based models, and the scale exploded: BERT (<a href="https://doi.org/10.18653/v1/N19-1423" rel="nofollow" target="_blank">Devlin et al., 2019</a>) had 340 million parameters, GPT-3 (<a href="https://doi.org/10.48550/arXiv.2005.14165" rel="nofollow" target="_blank">Brown et al., 2020</a>) reached 175 billion, and today’s largest models are estimated at well over a trillion parameters, trained on text corpora so vast they encompass a substantial fraction of everything ever written on the internet.</p>
<p>The core principle has not changed: predict the next word on the basis of the company it keeps. What changed is capaciousness. A model with 20 dimensions and 10,000 words can distinguish finance from energy; a model with billions of parameters and trillions of training tokens can distinguish a Shakespearean sonnet from a legal brief, track the implications of a subordinate clause across a 3,000-word passage and generate fluent prose in dozens of languages. Generative AI was not built on a fundamentally new idea about language – it was built by scaling Firth’s old idea up by many orders of magnitude and combining it with a crucial algorithmic innovation.</p>
</div>
<div id="the-transformer-revolution" class="section level2">
<h2>The Transformer Revolution</h2>
<p>That algorithmic innovation was the Transformer, introduced by <a href="https://doi.org/10.48550/arXiv.1706.03762" rel="nofollow" target="_blank">Vaswani et al.</a> in their 2017 paper ‘Attention Is All You Need’. Earlier language models relied on recurrent or convolutional neural networks, which processed words sequentially – reading a sentence one word at a time while trying to hold everything so far in memory. The approach worked, after a fashion, but it was slow and struggled with long-range dependencies.</p>
<p>The Transformer replaced all of that with <em>multi-head self-attention</em>: a mechanism that lets the model weigh every word in a passage simultaneously, comparing each one directly with every other. In plain terms, attention allows the model to ask, for each word, ‘which other words here matter most for understanding me?’ The idea is simple but transformative. It outperformed existing models on translation and a host of other tasks – without any recurrence or convolution – and was far faster to train in parallel.</p>
<p>With Transformers in hand, NLP entered a new era. Large pretrained models like BERT (<a href="https://doi.org/10.18653/v1/N19-1423" rel="nofollow" target="_blank">Devlin et al., 2019</a>) and the GPT series (<a href="https://doi.org/10.48550/arXiv.2005.14165" rel="nofollow" target="_blank">Brown et al., 2020</a>) set successive benchmarks for language understanding and generation. The combination of the Transformer architecture with the massive scale described above – hundreds of billions of parameters trained on essentially the whole internet – is what made generative AI possible. From Firth’s insight about co-occurrence, through LSA’s matrix decompositions and Word2Vec’s neural embeddings, to the attention-powered behemoths of today, the thread is continuous: predict the next word on the basis of the company it keeps. But despite their extraordinary power, these models remain <em>predictors of text</em>, not infallible oracles of truth. The Transformer revolution made the storyteller more eloquent; it did not make the storyteller more honest.</p>
</div>
<div id="fluency-is-not-truth" class="section level2">
<h2>Fluency Is Not Truth</h2>
<p>Crucially, LLMs are optimised for fluency, not truth. They have no built-in fact-checking; they simply predict plausible continuations. As <a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf" rel="nofollow" target="_blank">Radford et al. (2019)</a> showed with GPT-2, the training objective is straightforward: learn to predict the next token in a sequence, given all preceding tokens. The loss function rewards fluent, likely text – but it never rewards the model for replying ‘I don’t know.’ <a href="https://doi.org/10.18653/v1/2022.acl-long.229" rel="nofollow" target="_blank">Lin et al. (2022)</a> demonstrated with their TruthfulQA benchmark that models frequently produce confident but false answers rather than admitting uncertainty, and that larger models can actually perform <em>worse</em> on truthfulness because they are better at reproducing convincing-sounding misinformation from their training data. The upshot: models tend to guess when unsure, and they guess with alarming confidence.</p>
<p>Consider what this means in practice. Ask an LLM about a niche historical event, and it may cheerfully invent plausible-sounding details – dates, names, citations – that are entirely fabricated. Ask it about a scientific finding at the edges of its training data, and it may blend two real studies into one fictional hybrid, complete with a convincing journal name. This phenomenon, known as <em>hallucination</em>, is not a bug that will eventually be patched away; it is a structural feature of how these models work. <a href="https://doi.org/10.48550/arXiv.2401.11817" rel="nofollow" target="_blank">Xu et al. (2024)</a> demonstrated formally that if an LLM cannot reliably distinguish true from false statements in its training data, hallucinations are mathematically inevitable. The model’s very fluency becomes its greatest liability: it weaves a convincing narrative whether or not the underlying facts support it. In short, current LLMs are trained to be good <em>storytellers</em>, not guaranteed <em>truth-tellers</em>.</p>
</div>
<div id="why-prompts-matter-and-why-one-is-rarely-enough" class="section level2">
<h2>Why Prompts Matter – and Why One Is Rarely Enough</h2>
<p>Because of this predictive nature, prompt engineering is essential. A vague or generic question will often yield a superficial, off‑target or simply wrong answer. One <a href="https://cloud.google.com/discover/what-is-prompt-engineering" rel="nofollow" target="_blank">guide defines prompt engineering</a> as ‘the art and science of designing and optimising prompts to guide AI models towards generating the desired responses’. That sounds rather grand, but in practice it often means something as prosaic as adding context, specifying a format, giving an example or two, and then refining iteratively until the output is actually useful.</p>
<p>The sensitivity of LLMs to phrasing is remarkable – and, on first encounter, a little humbling. Asking ‘What are some criticisms of capitalism?’ and ‘What are the main drawbacks of market economies?’ can elicit strikingly different responses, even though the questions are conceptually near‑identical. <a href="https://doi.org/10.48550/arXiv.2310.11324" rel="nofollow" target="_blank">Sclar et al. (2024)</a> showed that even tiny changes – swapping a single word, reordering a clause, adding an explicit instruction to be concise – can dramatically alter what a model produces, with performance varying by up to 76 percentage points across prompt formats for the same task. <a href="https://doi.org/10.1038/s41746-024-01029-4" rel="nofollow" target="_blank">Wang et al. (2024)</a> found that well-engineered prompts can yield ‘ideal and stable answers’, but that different formulations can have very different effects on performance. Researchers testing LLMs rigorously often try dozens of prompt variants to achieve reliable output. A single query is, in most cases, simply not enough.</p>
<p>There is also the matter of <em>role</em>, <em>tone</em> and <em>constraints</em>. Instructing the model to respond as a sceptical scientist, a sympathetic teacher or a meticulous copy‑editor changes its behaviour markedly. Asking it to respond in plain English, to avoid jargon, to stay under 150 words or to number its assumptions shapes the answer in ways a bare question never could. Each of these additions is, in Firth’s terms, part of the ‘company’ the prompt keeps – and consequently part of what determines the model’s response.</p>
</div>
<div id="what-good-prompting-looks-like" class="section level2">
<h2>What Good Prompting Looks Like</h2>
<p>The savvy user treats an LLM as a collaborator requiring careful, iterative guidance – not a search engine that delivers verdicts on demand. Where possible, provide as much background information as possible, and invite the model to ask you any questions before responding. Consider the first reply from the model as a draft, not a conclusion. It is worth asking follow‑up questions, pushing back on suspect claims, requesting sources or alternative views, and rephrasing when the model goes off track. Ask it to explain its reasoning. Ask it to consider counter‑arguments. Ask it to flag what it is uncertain about. Each move draws more of the model’s latent capability to the surface.</p>
<p>This iterative approach mirrors good intellectual practice more generally. A scientist does not run one experiment and publish; they replicate, vary conditions and triangulate across methods. A journalist does not accept a single source; they seek corroboration. A doctor does not diagnose on one symptom; they gather a fuller picture. Using an LLM well requires the same instinct: treat each exchange as one data point in an ongoing investigation, not as the final word.</p>
</div>
<div id="a-powerful-tool-not-an-oracle" class="section level2">
<h2>A Powerful Tool, Not an Oracle</h2>
<p>Using an LLM is a bit like navigating a foreign city without a map: you will stumble upon genuinely useful places, but you will also take wrong turns, end up in dead ends, and occasionally find yourself confidently heading in exactly the wrong direction. These models will often produce accurate information, because language genuinely encodes reality – words cluster around what they describe, and texts about geography, commodity markets or sensory properties track how the world actually works (<a href="https://doi.org/10.1111/j.1551-6709.2008.01003.x" rel="nofollow" target="_blank">Louwerse & Zwaan, 2009</a>; <a href="https://doi.org/10.1111/j.1756-8765.2010.01106.x" rel="nofollow" target="_blank">Louwerse, 2011</a>). But an LLM is not ontologically conducive to truth: hallucinations are not a bug to be patched but a mathematical inevitability of the architecture (<a href="https://doi.org/10.48550/arXiv.2401.11817" rel="nofollow" target="_blank">Xu et al., 2024</a>). The underlying mechanism is still word co-occurrence – Firth’s old principle, scaled up. Neither the brute force of massive training data (<a href="https://doi.org/10.1177/09637214241242746" rel="nofollow" target="_blank">Connell & Lynott, 2024</a>) nor the ingenious attention mechanisms of modern architectures (<a href="https://doi.org/10.48550/arXiv.1706.03762" rel="nofollow" target="_blank">Vaswani et al., 2017</a>) has <em>yet</em> tamed this heuristic machine into a reliable truth-teller.</p>
<p>Good results take deliberate effort. A well-crafted prompt – with specific context, clear constraints, iterative refinement and healthy scepticism – does not transform the model into a truth engine. What it does is steer its predictions towards the regions of language that most faithfully reflect the world. Skip that effort, and arriving at the right destination becomes a matter of luck rather than design.</p>
<p>Firth’s insight about words applies equally to prompts: you shall know an answer by the company the question keeps (<a href="https://doi.org/10.48550/arXiv.2411.10541" rel="nofollow" target="_blank">He et al., 2024</a>; <a href="https://doi.org/10.1038/s41746-024-01029-4" rel="nofollow" target="_blank">Wang et al., 2024</a>).</p>
</div>
<div id="references" class="section level2">
<h2>References</h2>
<p>Bernabeu, P. (2022). <em>Language and sensorimotor simulation in conceptual processing: Multilevel analysis and statistical power</em> [Doctoral thesis, Lancaster University]. <a href="https://doi.org/10.17635/lancaster/thesis/1795" class="uri" rel="nofollow" target="_blank">https://doi.org/10.17635/lancaster/thesis/1795</a></p>
<p>Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. In <em>Advances in Neural Information Processing Systems</em> (Vol. 33, pp. 1877–1901). <a href="https://doi.org/10.48550/arXiv.2005.14165" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.2005.14165</a></p>
<p>Brunila, M., & LaViolette, J. (2022). What company do words keep? Revisiting the distributional semantics of J.R. Firth & Zellig Harris. <em>Proceedings of NAACL 2022</em>. <a href="https://doi.org/10.18653/v1/2022.naacl-main.327" class="uri" rel="nofollow" target="_blank">https://doi.org/10.18653/v1/2022.naacl-main.327</a></p>
<p>Connell, L., & Lynott, D. (2024). What can language models tell us about human cognition? <em>Current Directions in Psychological Science</em>. <a href="https://doi.org/10.1177/09637214241242746" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1177/09637214241242746</a></p>
<p>Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In <em>Proceedings of NAACL-HLT 2019</em> (pp. 4171–4186). Association for Computational Linguistics. <a href="https://doi.org/10.18653/v1/N19-1423" class="uri" rel="nofollow" target="_blank">https://doi.org/10.18653/v1/N19-1423</a></p>
<p>Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. <em>Journal of Statistical Software, 25</em>(5), 1–54. <a href="https://doi.org/10.18637/jss.v025.i05" class="uri" rel="nofollow" target="_blank">https://doi.org/10.18637/jss.v025.i05</a></p>
<p>Firth, J. R. (1957). <em>Studies in Linguistic Analysis</em>. Basil Blackwell.</p>
<p>Günther, F., Dudschig, C., & Kaup, B. (2016). LSAfun: An R package for computations based on Latent Semantic Analysis. <em>Behavior Research Methods, 48</em>(2), 409-421. <a href="https://doi.org/10.3758/s13428-015-0662-x" class="uri" rel="nofollow" target="_blank">https://doi.org/10.3758/s13428-015-0662-x</a></p>
<p>He, J., Rungta, M., Koleczek, D., Sekhon, A., Wang, F. X., & Hasan, S. (2024). Does prompt formatting have any impact on LLM performance? <em>arXiv</em>. <a href="https://doi.org/10.48550/arXiv.2411.10541" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.2411.10541</a></p>
<p>Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. <em>Psychological Review, 104</em>(2), 211–240. <a href="https://doi.org/10.1037/0033-295X.104.2.211" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1037/0033-295X.104.2.211</a></p>
<p>Lewis, D. D. (1997). <em>Reuters-21578 text categorization test collection, distribution 1.0</em> [Dataset]. AT&T Bell Laboratories. <a href="http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html" class="uri" rel="nofollow" target="_blank">http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html</a></p>
<p>Lin, S., Hilton, J., & Evans, O. (2022). TruthfulQA: Measuring how models mimic human falsehoods. In <em>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics</em> (pp. 3214–3252). <a href="https://doi.org/10.18653/v1/2022.acl-long.229" class="uri" rel="nofollow" target="_blank">https://doi.org/10.18653/v1/2022.acl-long.229</a></p>
<p>Louwerse, M. M. (2011). Symbol interdependency in symbolic and embodied cognition. <em>Topics in Cognitive Science, 3</em>(2), 273–302. <a href="https://doi.org/10.1111/j.1756-8765.2010.01106.x" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1111/j.1756-8765.2010.01106.x</a></p>
<p>Louwerse, M., & Connell, L. (2011). A taste of words: Linguistic context and perceptual simulation predict the modality of words. <em>Cognitive Science, 35</em>(2), 381–398. <a href="https://doi.org/10.1111/j.1551-6709.2010.01157.x" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1111/j.1551-6709.2010.01157.x</a></p>
<p>Louwerse, M. M., & Zwaan, R. A. (2009). Language encodes geographical information. <em>Cognitive Science, 33</em>(1), 51–73. <a href="https://doi.org/10.1111/j.1551-6709.2008.01003.x" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1111/j.1551-6709.2008.01003.x</a></p>
<p>Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. <em>Behavior Research Methods, Instruments, & Computers, 28</em>(2), 203–208. <a href="https://doi.org/10.3758/BF03204766" class="uri" rel="nofollow" target="_blank">https://doi.org/10.3758/BF03204766</a></p>
<p>Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. <em>arXiv</em>. <a href="https://doi.org/10.48550/arXiv.1301.3781" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.1301.3781</a></p>
<p>Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. <em>OpenAI Blog</em>. <a href="https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf" class="uri" rel="nofollow" target="_blank">https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf</a></p>
<p>Sclar, M., Choi, Y., Tsvetkov, Y., & Suhr, A. (2024). Quantifying language models’ sensitivity to spurious features in prompt design. In <em>Proceedings of ICLR 2024</em>. <a href="https://doi.org/10.48550/arXiv.2310.11324" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.2310.11324</a></p>
<p>Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In <em>Advances in Neural Information Processing Systems</em> (Vol. 30). <a href="https://doi.org/10.48550/arXiv.1706.03762" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.1706.03762</a></p>
<p>Wang, L., Chen, X., Deng, X., Wen, H., You, M., Liu, W., Li, Q., & Li, J. (2024). Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. <em>NPJ Digital Medicine, 7</em>, Article 41. <a href="https://doi.org/10.1038/s41746-024-01029-4" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1038/s41746-024-01029-4</a></p>
<p>Wu, M., Conde, J., Reviriego, P., & Brysbaert, M. (2026). How does fine-tuning improve sensorimotor representations in large language models? <em>arXiv</em>. <a href="https://doi.org/10.48550/arXiv.2603.03313" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.2603.03313</a></p>
<p>Xu, Z., Jain, S., & Kankanhalli, M. (2024). Hallucination is inevitable: An innate limitation of large language models. <em>arXiv</em>. <a href="https://doi.org/10.48550/arXiv.2401.11817" class="uri" rel="nofollow" target="_blank">https://doi.org/10.48550/arXiv.2401.11817</a></p>
<p>Xu, Q., Peng, Y., Nastase, S. A., Chodorow, M., Wu, M., & Li, P. (2025). Large language models without grounding recover non-sensorimotor but not sensorimotor features of human concepts. <em>Nature Human Behaviour, 9</em>(9), 1871–1886. <a href="https://doi.org/10.1038/s41562-025-02203-8" class="uri" rel="nofollow" target="_blank">https://doi.org/10.1038/s41562-025-02203-8</a></p>
</div>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://pablobernabeu.github.io/2026/you-shall-know-a-word-by-the-company-it-keeps/"> Pablo Bernabeu</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/you-shall-know-a-word-by-the-company-it-keeps-so-choose-your-prompts-wisely/">You shall know a word by the company it keeps — so choose your prompts wisely</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400072</post-id>	</item>
		<item>
		<title>Digital Biology with R: Advanced Bioinformatics, Predictive Modeling, and Time Series Analysis for Modern Life Sciences</title>
		<link>https://www.r-bloggers.com/2026/03/digital-biology-with-r-advanced-bioinformatics-predictive-modeling-and-time-series-analysis-for-modern-life-sciences/</link>
		
		<dc:creator><![CDATA[rprogrammingbooks]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 20:48:03 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://rprogrammingbooks.com/?p=2528</guid>

					<description><![CDATA[<p>Digital Biology with R Digital biology is no longer a niche intersection between biology and computation. It has become a core framework for how modern laboratories, biomedical teams, and translational researchers generate insight from complex biological systems. Whether the objective is to identify gene-expression signatures, model disease progression, classify patient ...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/digital-biology-with-r-advanced-bioinformatics-predictive-modeling-and-time-series-analysis-for-modern-life-sciences/">Digital Biology with R: Advanced Bioinformatics, Predictive Modeling, and Time Series Analysis for Modern Life Sciences</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://rprogrammingbooks.com/digital-biology-with-r/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=digital-biology-with-r"> Blog - R Programming Books</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <meta name="description" content="A professional macro-post on digital biology with R, covering bioinformatics workflows, predictive modeling, time series analysis, transcriptomics, visualization, and reproducible research in modern life sciences." />
  <meta name="keywords" content="digital biology with R, bioinformatics in R, transcriptomics R, predictive modeling medical data, time series analysis with R, healthcare analytics with R, Bioconductor, DESeq2, ggplot2, computational biology, biological data science" />
  <meta name="author" content="OpenAI" />
  <title>Digital Biology with R</title>
  <style>
    body {
      font-family: Arial, Helvetica, sans-serif;
      line-height: 1.75;
      color: #1f2937;
      max-width: 1100px;
      margin: 0 auto;
      padding: 40px 24px;
      background: #ffffff;
    }
    h2, h3, h4 {
      color: #0f172a;
      margin-top: 36px;
      margin-bottom: 14px;
    }
    p {
      margin-bottom: 18px;
    }
    pre {
      background: #0b1020;
      color: #e5e7eb;
      padding: 18px;
      overflow-x: auto;
      border-radius: 10px;
      margin: 22px 0;
      font-size: 14px;
      line-height: 1.55;
    }
    code {
      font-family: Consolas, Monaco, monospace;
    }
    a {
      color: #1d4ed8;
      text-decoration: none;
    }
    a:hover {
      text-decoration: underline;
    }
    .lead {
      font-size: 1.08rem;
      color: #374151;
    }
    .note {
      background: #f8fafc;
      border-left: 4px solid #2563eb;
      padding: 16px 18px;
      margin: 20px 0;
      border-radius: 8px;
    }
    .section {
      margin-bottom: 28px;
    }
    ul {
      margin: 14px 0 20px 24px;
    }
    li {
      margin-bottom: 8px;
    }
    .closing {
      background: #f9fafb;
      padding: 22px;
      border-radius: 12px;
      margin-top: 34px;
    }
  </style>
</head>
<body>

  <div class="section">
    <p class="lead">
      Digital biology is no longer a niche intersection between biology and computation. It has become a core framework for how modern laboratories, biomedical teams, and translational researchers generate insight from complex biological systems. Whether the objective is to identify gene-expression signatures, model disease progression, classify patient subgroups, or study temporal changes in biological signals, the ability to work fluently with data is now inseparable from the practice of advanced life science.
    </p>

    <p>
      In this context, <strong>R</strong> remains one of the most powerful and professionally relevant environments for biological data science. Its strengths go far beyond general statistics. R provides a mature ecosystem for reproducible analysis, publication-grade visualization, predictive modeling, medical data interpretation, and high-dimensional biological workflows. For teams working across transcriptomics, clinical analytics, systems biology, or longitudinal biosignal analysis, digital biology with R offers both depth and flexibility.
    </p>

    <p>
      A serious digital biology workflow in R typically combines several capabilities at once: structured data import, metadata harmonization, exploratory analysis, statistical modeling, machine learning, time-aware biological interpretation, and clear communication of findings. This is precisely why concepts associated with
      <a href="https://rprogrammingbooks.com/product/healthcare-analytics-r-predictive-modeling-medical-data/" rel="nofollow" target="_blank">predictive modeling for medical data in R</a>
      and
      <a href="https://rprogrammingbooks.com/product/time-series-analysis-in-r-book/" rel="nofollow" target="_blank">time series analysis with R</a>
      are becoming increasingly relevant in computational biology. Even professionals whose core focus is omics data benefit from thinking more broadly about biomedical prediction and temporal biological structure.
    </p>

    <div class="note">
      <p>
        From a strategic learning perspective, this is one reason why resources such as
        <a href="https://rprogrammingbooks.com/product/healthcare-analytics-r-predictive-modeling-medical-data/" rel="nofollow" target="_blank"><strong>Healthcare Analytics with R: Predictive Modeling for Medical Data</strong></a>
        and
        <a href="https://rprogrammingbooks.com/product/time-series-analysis-in-r-book/" rel="nofollow" target="_blank"><strong>Time Series Analysis with R</strong></a>
        fit naturally into a digital biology skill set. Even when the application is not purely clinical or purely forecasting-oriented, both domains strengthen the analytical mindset required for modern biological data interpretation.
      </p>
    </div>
  </div>

  <div class="section">
    <h2>Why R is a Professional Standard in Digital Biology</h2>

    <p>
      The case for R in digital biology is not simply historical. It is practical. Biological datasets are noisy, heterogeneous, high-dimensional, and deeply contextual. Unlike generic analytics workflows, biological interpretation demands tools that can handle structured experimental design, repeated measurements, batch effects, sparse signals, and biologically meaningful visualization. R is exceptionally strong in these areas.
    </p>

    <p>
      Several features explain its enduring relevance:
    </p>

    <ul>
      <li>Rich statistical foundations for biological inference</li>
      <li>Outstanding visualization via packages such as <code>ggplot2</code></li>
      <li>Robust bioinformatics infrastructure through <code>Bioconductor</code></li>
      <li>Flexible modeling for clinical and biomedical prediction</li>
      <li>Excellent support for reproducible research and reporting</li>
      <li>Strong support for longitudinal and time-dependent data analysis</li>
    </ul>

    <p>
      In other words, R is not merely a coding language for scientists. It is a full analytical environment for translating biological complexity into evidence.
    </p>
  </div>

  <div class="section">
    <h2>Core Setup for a Digital Biology Workflow in R</h2>

    <p>
      Any professional analysis should begin with a clean, explicit computational environment. This improves reproducibility, allows collaborators to review assumptions, and reduces hidden sources of variation. Below is a practical setup that combines general data science tools with packages often used in transcriptomics, statistical learning, and biological visualization.
    </p>

<pre># Core data wrangling and visualization
library(tidyverse)

# Bioinformatics packages
library(DESeq2)
library(pheatmap)
library(limma)
library(edgeR)

# Statistical learning and modeling
library(caret)
library(glmnet)
library(randomForest)

# Time-aware analysis
library(forecast)
library(tsibble)
library(fable)

# Annotation and interpretation
library(clusterProfiler)
library(org.Hs.eg.db)

# Helpful utilities
library(broom)
library(ggrepel)
library(pROC)

set.seed(123)

theme_set(
  theme_minimal(base_size = 13) +
    theme(
      plot.title = element_text(face = &quot;bold&quot;),
      axis.title = element_text(face = &quot;bold&quot;),
      panel.grid.minor = element_blank()
    )
)
</pre>

    <p>
      This package combination reflects a wider truth about digital biology with R: modern workflows are often hybrid. A project may start with RNA-seq counts, then move into clinical prediction, then require temporal modeling of follow-up measurements. The strongest analysts are increasingly those who can connect these stages seamlessly rather than treating them as separate disciplines.
    </p>
  </div>

  <div class="section">
    <h2>Importing Biological and Clinical Data</h2>

    <p>
      High-quality analysis begins with structured data ingestion. In digital biology, it is common to work with at least two linked datasets: a feature matrix and a metadata table. In transcriptomics, the feature matrix may contain genes by samples. In biomedical prediction, it may contain biomarkers, laboratory values, imaging scores, or derived molecular features. The metadata usually includes conditions, treatment groups, demographic variables, batch identifiers, time points, and outcomes.
    </p>

<pre># Read count matrix and sample metadata
counts &lt;- read.csv(&quot;gene_counts.csv&quot;, row.names = 1, check.names = FALSE)
metadata &lt;- read.csv(&quot;sample_metadata.csv&quot;, row.names = 1)

# Ensure samples align
metadata &lt;- metadata[colnames(counts), , drop = FALSE]

# Inspect dimensions
dim(counts)
dim(metadata)

# Preview data
head(counts[, 1:6])
head(metadata)

# Basic integrity checks
stopifnot(all(colnames(counts) == rownames(metadata)))
sum(is.na(counts))
sum(is.na(metadata))

# Explore metadata structure
str(metadata)
table(metadata$condition)
table(metadata$batch)
table(metadata$timepoint)
</pre>

    <p>
      At this stage, professionals should pause and inspect structure rather than rushing into modeling. Many downstream problems can be prevented here: sample misalignment, inconsistent labels, unbalanced groups, missing covariates, and silent import errors. In biological work, methodological discipline begins before the first plot is drawn.
    </p>
  </div>

  <div class="section">
    <h2>Quality Control and Filtering of Biological Features</h2>

    <p>
      Biological datasets often include features with very low information content. In RNA-seq, genes with extremely low counts contribute noise and inflate multiple-testing burden. In medical datasets, biomarkers with near-zero variance or severe missingness can destabilize models. Filtering is therefore not a cosmetic step. It is part of the inferential foundation.
    </p>

<pre># Total reads per sample
library_sizes &lt;- colSums(counts)
sort(library_sizes)

# Visualize library size distribution
library_df &lt;- tibble(
  sample = names(library_sizes),
  total_counts = library_sizes
)

ggplot(library_df, aes(x = reorder(sample, total_counts), y = total_counts)) +
  geom_col() +
  coord_flip() +
  labs(
    title = &quot;Library Size per Sample&quot;,
    x = &quot;Sample&quot;,
    y = &quot;Total Counts&quot;
  )

# Filter low-count genes
keep_genes &lt;- rowSums(counts &gt;= 10) &gt;= 3
counts_filtered &lt;- counts[keep_genes, ]

dim(counts_filtered)

# Optional: identify highly variable genes after transformation
log_counts &lt;- log2(counts_filtered + 1)
gene_variance &lt;- apply(log_counts, 1, var)

hv_genes &lt;- names(sort(gene_variance, decreasing = TRUE))[1:500]
length(hv_genes)
head(hv_genes)
</pre>

    <p>
      This step is often underestimated, yet it reflects one of the core principles of rigorous digital biology: not every measured variable deserves equal inferential attention. Careful preprocessing improves stability, interpretability, and signal detection.
    </p>
  </div>

  <div class="section">
    <h2>Differential Expression Analysis with DESeq2</h2>

    <p>
      A cornerstone task in digital biology is identifying features that differ systematically across biological conditions. In gene expression analysis, this is typically addressed with differential expression models. In R, <code>DESeq2</code> remains a leading framework because it combines suitable count-based modeling with strong normalization methods and clear inferential outputs.
    </p>

<pre># Build DESeq2 object
dds &lt;- DESeqDataSetFromMatrix(
  countData = counts_filtered,
  colData = metadata,
  design = ~ batch + condition
)

# Run differential expression pipeline
dds &lt;- DESeq(dds)

# Extract normalized counts
norm_counts &lt;- counts(dds, normalized = TRUE)

# Results for treatment vs control
res &lt;- results(dds, contrast = c(&quot;condition&quot;, &quot;treated&quot;, &quot;control&quot;))

# Order by adjusted p-value
res_ordered &lt;- res[order(res$padj), ]
res_df &lt;- as.data.frame(res_ordered) %&gt;%
  rownames_to_column(&quot;gene&quot;)

head(res_df)

# Summary of results
summary(res)

# Significant genes
sig_res &lt;- res_df %&gt;%
  filter(!is.na(padj), padj &lt; 0.05, abs(log2FoldChange) &gt; 1)

nrow(sig_res)
head(sig_res)
</pre>

    <p>
      The logic here is deeply aligned with professional bioinformatics practice. We are not merely searching for large fold changes. We are modeling expression while accounting for dispersion, library size, and design structure. When analysts speak about reliable biological signal, this statistical scaffolding is what makes the claim credible.
    </p>
  </div>

  <div class="section">
    <h2>Variance Stabilization and Exploratory Biological Patterns</h2>

    <p>
      Raw counts are appropriate for inference within count models, but transformed values are often more useful for exploratory analysis, clustering, and visualization. Variance stabilization helps reveal sample-level patterns that are obscured in count scale.
    </p>

<pre># Variance stabilizing transformation
vsd &lt;- vst(dds, blind = FALSE)
vsd_mat &lt;- assay(vsd)

# Principal component analysis
pca_data &lt;- plotPCA(vsd, intgroup = c(&quot;condition&quot;, &quot;batch&quot;), returnData = TRUE)
percent_var &lt;- round(100 * attr(pca_data, &quot;percentVar&quot;))

ggplot(pca_data, aes(PC1, PC2, color = condition, shape = batch, label = name)) +
  geom_point(size = 4) +
  geom_text_repel(size = 3.5, max.overlaps = 20) +
  labs(
    title = &quot;PCA of Variance-Stabilized Expression Data&quot;,
    x = paste0(&quot;PC1: &quot;, percent_var[1], &quot;% variance&quot;),
    y = paste0(&quot;PC2: &quot;, percent_var[2], &quot;% variance&quot;)
  )

# Sample-to-sample distance heatmap
sample_dists &lt;- dist(t(vsd_mat))
sample_dist_matrix &lt;- as.matrix(sample_dists)

pheatmap(
  sample_dist_matrix,
  clustering_distance_rows = sample_dists,
  clustering_distance_cols = sample_dists,
  main = &quot;Sample Distance Heatmap&quot;
)
</pre>

    <p>
      PCA and clustering are not just aesthetic additions. They answer fundamental questions: Do biological groups separate? Is there evidence of batch structure? Are any samples acting as outliers? In practice, these plots often determine whether a project moves forward confidently or returns to quality assessment.
    </p>
  </div>

  <div class="section">
    <h2>Volcano Plots and Expression Heatmaps</h2>

    <p>
      Communication matters in digital biology. If results cannot be clearly visualized, they cannot be effectively interpreted, reviewed, or shared. Volcano plots and heatmaps remain two of the most useful ways to summarize differential signal.
    </p>

<pre># Volcano plot
volcano_df &lt;- res_df %&gt;%
  mutate(
    significance = case_when(
      !is.na(padj) & padj &lt; 0.05 &#038; log2FoldChange &gt; 1  ~ &quot;Upregulated&quot;,
      !is.na(padj) & padj &lt; 0.05 &#038; log2FoldChange &lt; -1 ~ &quot;Downregulated&quot;,
      TRUE ~ &quot;Not significant&quot;
    ),
    neg_log10_padj = -log10(padj)
  )

ggplot(volcano_df, aes(x = log2FoldChange, y = neg_log10_padj, color = significance)) +
  geom_point(alpha = 0.75) +
  geom_vline(xintercept = c(-1, 1), linetype = &quot;dashed&quot;) +
  geom_hline(yintercept = -log10(0.05), linetype = &quot;dashed&quot;) +
  labs(
    title = &quot;Volcano Plot of Differential Expression&quot;,
    x = &quot;Log2 Fold Change&quot;,
    y = &quot;-Log10 Adjusted P-value&quot;
  )

# Heatmap of top significant genes
top_genes &lt;- sig_res %&gt;%
  slice_min(order_by = padj, n = 30) %&gt;%
  pull(gene)

heatmap_mat &lt;- vsd_mat[top_genes, ]
heatmap_mat_scaled &lt;- t(scale(t(heatmap_mat)))

annotation_col &lt;- metadata %&gt;%
  select(condition, batch)

pheatmap(
  heatmap_mat_scaled,
  annotation_col = annotation_col,
  show_rownames = TRUE,
  show_colnames = TRUE,
  clustering_method = &quot;complete&quot;,
  main = &quot;Top Differentially Expressed Genes&quot;
)
</pre>

    <p>
      A strong digital biology report does not overwhelm the reader with raw output. Instead, it synthesizes significance, directionality, effect size, and group structure into visuals that support biological reasoning.
    </p>
  </div>

  <div class="section">
    <h2>From Omics to Biomedical Prediction</h2>

    <p>
      One of the most valuable evolutions in digital biology is the move from descriptive molecular analysis toward predictive modeling. This is where the boundary between bioinformatics and biomedical analytics becomes especially productive. Biological features can be used not only to explain differences between groups, but also to classify disease status, estimate risk, predict response, or support clinical stratification.
    </p>

    <p>
      This broader perspective is exactly why themes associated with
      <a href="https://rprogrammingbooks.com/product/healthcare-analytics-r-predictive-modeling-medical-data/" rel="nofollow" target="_blank"><strong>Healthcare Analytics with R</strong></a>
      are increasingly relevant to life scientists. Predictive modeling for medical data is not separate from digital biology. In many modern projects, it is the next analytical step after feature selection and biological characterization.
    </p>

<pre># Example: prepare a classification dataset using selected genes
selected_genes &lt;- sig_res %&gt;%
  slice_min(order_by = padj, n = 50) %&gt;%
  pull(gene)

model_df &lt;- as.data.frame(t(vsd_mat[selected_genes, ])) %&gt;%
  rownames_to_column(&quot;sample&quot;) %&gt;%
  left_join(metadata %&gt;% rownames_to_column(&quot;sample&quot;), by = &quot;sample&quot;)

# Convert outcome to factor
model_df$condition &lt;- factor(model_df$condition)

# Train/test split
set.seed(123)
train_index &lt;- createDataPartition(model_df$condition, p = 0.8, list = FALSE)
train_data &lt;- model_df[train_index, ]
test_data  &lt;- model_df[-train_index, ]

# Logistic regression with regularization
x_train &lt;- model.matrix(condition ~ . - sample, data = train_data)[, -1]
y_train &lt;- train_data$condition

x_test &lt;- model.matrix(condition ~ . - sample, data = test_data)[, -1]
y_test &lt;- test_data$condition

cv_fit &lt;- cv.glmnet(
  x = x_train,
  y = y_train,
  family = &quot;binomial&quot;,
  alpha = 1
)

best_lambda &lt;- cv_fit$lambda.min
best_lambda

pred_prob &lt;- predict(cv_fit, newx = x_test, s = &quot;lambda.min&quot;, type = &quot;response&quot;)
pred_class &lt;- ifelse(pred_prob &gt; 0.5, levels(y_train)[2], levels(y_train)[1]) %&gt;%
  factor(levels = levels(y_train))

confusionMatrix(pred_class, y_test)

# ROC curve
roc_obj &lt;- roc(response = y_test, predictor = as.numeric(pred_prob))
auc(roc_obj)

plot(roc_obj, main = &quot;ROC Curve for Biomarker-Based Classification&quot;)
</pre>

    <p>
      This workflow illustrates an important professional principle: biological significance and predictive utility are related but not identical. A feature may be statistically different yet add little prediction. Conversely, a stable predictive combination may emerge from multiple modest features. Analysts in digital biology must be comfortable evaluating both dimensions.
    </p>
  </div>

  <div class="section">
    <h2>Model Interpretation and Feature Importance</h2>

    <p>
      Predictive models become more useful when they can be interpreted responsibly. In biomedical contexts, this matters for scientific credibility, stakeholder communication, and eventual translational relevance.
    </p>

<pre># Random forest example
rf_model &lt;- randomForest(
  condition ~ . - sample,
  data = train_data,
  importance = TRUE,
  ntree = 500
)

rf_pred &lt;- predict(rf_model, newdata = test_data)
confusionMatrix(rf_pred, y_test)

# Variable importance
importance_df &lt;- importance(rf_model) %&gt;%
  as.data.frame() %&gt;%
  rownames_to_column(&quot;feature&quot;) %&gt;%
  arrange(desc(MeanDecreaseGini))

head(importance_df, 15)

ggplot(importance_df %&gt;% slice_max(order_by = MeanDecreaseGini, n = 15),
       aes(x = reorder(feature, MeanDecreaseGini), y = MeanDecreaseGini)) +
  geom_col() +
  coord_flip() +
  labs(
    title = &quot;Top Features by Random Forest Importance&quot;,
    x = &quot;Feature&quot;,
    y = &quot;Mean Decrease Gini&quot;
  )
</pre>

    <p>
      In practice, interpretability is not a single metric. It is a disciplined process of relating selected variables back to biological mechanisms, assay characteristics, experimental design, and disease context. This is where statistical maturity and domain understanding must meet.
    </p>
  </div>

  <div class="section">
    <h2>Time Series Analysis in Digital Biology</h2>

    <p>
      Not all biological processes are static snapshots. Many of the most interesting systems in biology unfold over time: circadian rhythms, immune response trajectories, treatment adaptation, tumor evolution, metabolic fluctuations, neural signals, and longitudinal patient outcomes. For this reason,
      <a href="https://rprogrammingbooks.com/product/time-series-analysis-in-r-book/" rel="nofollow" target="_blank"><strong>time series analysis with R</strong></a>
      is increasingly valuable in digital biology.
    </p>

    <p>
      The ability to model biological variation across time expands analysis beyond cross-sectional comparison. It enables trend detection, seasonality assessment, smoothing, short-term forecasting, and dynamic interpretation of living systems. Even a foundational understanding of temporal modeling can dramatically improve how a researcher handles repeated biological measurements.
    </p>

<pre># Example: longitudinal biomarker measurements
biomarker_ts &lt;- read.csv(&quot;biomarker_time_series.csv&quot;)

head(biomarker_ts)

# Suppose columns: date, patient_id, biomarker_value
biomarker_ts$date &lt;- as.Date(biomarker_ts$date)

# Aggregate mean biomarker value by date
daily_signal &lt;- biomarker_ts %&gt;%
  group_by(date) %&gt;%
  summarise(mean_value = mean(biomarker_value, na.rm = TRUE)) %&gt;%
  arrange(date)

ggplot(daily_signal, aes(x = date, y = mean_value)) +
  geom_line() +
  labs(
    title = &quot;Average Biomarker Signal Over Time&quot;,
    x = &quot;Date&quot;,
    y = &quot;Mean Biomarker Value&quot;
  )

# Convert to time series object
signal_ts &lt;- ts(daily_signal$mean_value, frequency = 7)

# Decomposition
signal_decomp &lt;- stl(signal_ts, s.window = &quot;periodic&quot;)
plot(signal_decomp)

# Automatic ARIMA model
fit_arima &lt;- auto.arima(signal_ts)
summary(fit_arima)

# Forecast next 14 periods
signal_forecast &lt;- forecast(fit_arima, h = 14)
plot(signal_forecast)
</pre>

    <p>
      This kind of analysis is highly relevant when biological response is not instantaneous. In translational research, temporal behavior may be more informative than a single endpoint. This is why learning patterns associated with
      <a href="https://rprogrammingbooks.com/product/time-series-analysis-in-r-book/" rel="nofollow" target="_blank">a practical guide to modeling and forecasting in R</a>
      can be unexpectedly powerful for biologists, especially when studying dynamic phenotypes.
    </p>
  </div>

  <div class="section">
    <h2>Gene-Level Temporal Analysis</h2>

    <p>
      Time-dependent biology also appears in omics. For example, gene expression may be measured before treatment, during exposure, and after recovery. In such cases, one can examine temporal structure directly at the feature level.
    </p>

<pre># Example metadata with repeated time points
metadata$timepoint &lt;- factor(metadata$timepoint, levels = c(&quot;T0&quot;, &quot;T1&quot;, &quot;T2&quot;, &quot;T3&quot;))

dds_time &lt;- DESeqDataSetFromMatrix(
  countData = counts_filtered,
  colData = metadata,
  design = ~ patient_id + timepoint
)

dds_time &lt;- DESeq(dds_time)

# Compare T3 vs T0
res_time &lt;- results(dds_time, contrast = c(&quot;timepoint&quot;, &quot;T3&quot;, &quot;T0&quot;))
res_time_df &lt;- as.data.frame(res_time) %&gt;%
  rownames_to_column(&quot;gene&quot;) %&gt;%
  filter(!is.na(padj)) %&gt;%
  arrange(padj)

head(res_time_df)

# Plot trajectories for selected genes
trajectory_genes &lt;- res_time_df %&gt;%
  slice_min(order_by = padj, n = 6) %&gt;%
  pull(gene)

traj_df &lt;- vsd_mat[trajectory_genes, ] %&gt;%
  as.data.frame() %&gt;%
  rownames_to_column(&quot;gene&quot;) %&gt;%
  pivot_longer(-gene, names_to = &quot;sample&quot;, values_to = &quot;expression&quot;) %&gt;%
  left_join(metadata %&gt;% rownames_to_column(&quot;sample&quot;), by = &quot;sample&quot;)

ggplot(traj_df, aes(x = timepoint, y = expression, group = patient_id, color = patient_id)) +
  geom_line(alpha = 0.7) +
  geom_point() +
  facet_wrap(~ gene, scales = &quot;free_y&quot;) +
  labs(
    title = &quot;Gene Expression Trajectories Across Time&quot;,
    x = &quot;Time Point&quot;,
    y = &quot;Variance-Stabilized Expression&quot;
  )
</pre>

    <p>
      These trajectory plots are especially informative because they convert abstract significance into temporal biological behavior. They help answer questions such as whether a gene responds early, accumulates gradually, reverses later, or varies strongly between individuals.
    </p>
  </div>

  <div class="section">
    <h2>Functional Interpretation and Pathway Enrichment</h2>

    <p>
      Lists of significant genes are not the final product of digital biology. They are intermediate artifacts. Real insight emerges when molecular changes are interpreted in the context of biological pathways, cellular functions, and disease mechanisms.
    </p>

<pre># Convert gene symbols to ENTREZ IDs
gene_symbols &lt;- sig_res$gene

gene_map &lt;- bitr(
  gene_symbols,
  fromType = &quot;SYMBOL&quot;,
  toType = &quot;ENTREZID&quot;,
  OrgDb = org.Hs.eg.db
)

head(gene_map)

# GO enrichment
ego &lt;- enrichGO(
  gene = gene_map$ENTREZID,
  OrgDb = org.Hs.eg.db,
  ont = &quot;BP&quot;,
  pAdjustMethod = &quot;BH&quot;,
  pvalueCutoff = 0.05,
  qvalueCutoff = 0.05,
  readable = TRUE
)

head(as.data.frame(ego))

# Dotplot
dotplot(ego, showCategory = 15)

# KEGG enrichment
ekegg &lt;- enrichKEGG(
  gene = gene_map$ENTREZID,
  organism = &quot;hsa&quot;,
  pvalueCutoff = 0.05
)

head(as.data.frame(ekegg))
barplot(ekegg, showCategory = 10)
</pre>

    <p>
      Pathway-level interpretation anchors the analysis in biology rather than leaving it at the level of statistical output. This is essential when communicating results to collaborators in wet-lab biology, medicine, translational research, or biotech development.
    </p>
  </div>

  <div class="section">
    <h2>Reproducibility, Reporting, and Professional Standards</h2>

    <p>
      One of the defining marks of professional digital biology is reproducibility. Analyses should be re-runnable, traceable, and explainable. In practice, this means using scripts instead of manual spreadsheets, versioning code, recording package versions, and structuring outputs clearly.
    </p>

<pre># Save key results
write.csv(res_df, &quot;deseq2_results_full.csv&quot;, row.names = FALSE)
write.csv(sig_res, &quot;deseq2_significant_genes.csv&quot;, row.names = FALSE)
write.csv(importance_df, &quot;model_feature_importance.csv&quot;, row.names = FALSE)

# Save transformed matrix
write.csv(as.data.frame(vsd_mat), &quot;variance_stabilized_expression.csv&quot;)

# Session information
sessionInfo()
</pre>

    <p>
      Many high-quality biological analyses fail to create lasting value because they are difficult to audit or reproduce. Strong R workflows help solve that problem. This is one reason digital biology with R continues to matter not just for discovery, but for scientific integrity.
    </p>
  </div>

  <div class="section">
    <h2>Strategic Perspective: Why Digital Biology Needs Both Prediction and Time</h2>

    <p>
      There is a broader lesson running through all of these workflows. Modern digital biology is no longer limited to one analytical mindset. It requires the integration of molecular inference, biomedical prediction, and dynamic temporal thinking. A researcher who can identify differentially expressed genes but cannot evaluate predictive performance is incomplete. A modeler who can classify patients but ignores longitudinal structure may miss the real biology. A statistician who can forecast signals but cannot relate them to biological pathways risks analytical abstraction without scientific relevance.
    </p>

    <p>
      This is why the most valuable R skill set in life sciences increasingly spans multiple domains. A foundation in bioinformatics remains essential, but it becomes even more powerful when complemented by competencies associated with
      <a href="https://rprogrammingbooks.com/product/healthcare-analytics-r-predictive-modeling-medical-data/" rel="nofollow" target="_blank">healthcare analytics with R</a>
      and
      <a href="https://rprogrammingbooks.com/product/time-series-analysis-in-r-book/" rel="nofollow" target="_blank">time series modeling and forecasting in R</a>.
      That combination reflects the real direction of modern computational biology.
    </p>
  </div>

  <div class="closing">
    <h2>Conclusion</h2>

    <p>
      Digital biology with R is not just about coding. It is about building a disciplined analytical framework for understanding living systems through data. From transcriptomics and pathway analysis to medical prediction and temporal biomarker modeling, R provides the professional infrastructure needed to move from raw measurements to defensible insight.
    </p>

    <p>
      The future of biological research belongs increasingly to people who can connect statistical rigor, computational reproducibility, and biological interpretation. In that environment, skills related to <strong>bioinformatics in R</strong>, <strong>predictive modeling for medical data</strong>, and <strong>time series analysis with R</strong> are not separate tracks. They are complementary pillars of modern digital biology.
    </p>

    <p>
      For scientists, analysts, clinicians, and interdisciplinary teams looking to strengthen that capability, learning how to combine these approaches is one of the smartest investments they can make. R remains one of the best places to do that work.
    </p>
  </div>

</body>
</html>
<p>The post <a href="https://rprogrammingbooks.com/digital-biology-with-r/" rel="nofollow" target="_blank">Digital Biology with R: Advanced Bioinformatics, Predictive Modeling, and Time Series Analysis for Modern Life Sciences</a> appeared first on <a href="https://rprogrammingbooks.com/" rel="nofollow" target="_blank">R Programming Books</a>.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://rprogrammingbooks.com/digital-biology-with-r/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=digital-biology-with-r"> Blog - R Programming Books</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/digital-biology-with-r-advanced-bioinformatics-predictive-modeling-and-time-series-analysis-for-modern-life-sciences/">Digital Biology with R: Advanced Bioinformatics, Predictive Modeling, and Time Series Analysis for Modern Life Sciences</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400051</post-id>	</item>
		<item>
		<title>Using science to find the best decaf</title>
		<link>https://www.r-bloggers.com/2026/03/using-science-to-find-the-best-decaf/</link>
		
		<dc:creator><![CDATA[Giles Dickenson-Jones]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 04:03:08 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://www.gilesd-j.com/?p=4177</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> TLDR: To test whether I could tell the difference between decaf coffees I conducted a highly scientific test (subject to […]<br />
The post Using science to find the best decaf appeared first on Giles.</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/using-science-to-find-the-best-decaf/">Using science to find the best decaf</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.gilesd-j.com/2026/03/23/using-science-to-find-the-best-decaf/"> Data Analytics and AI Archives - Giles</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<p class=""><strong>TLDR:</strong> To test whether I could tell the difference between decaf coffees I conducted a highly scientific test (subject to funding constraints).</p>



<p class="">One of my goals for 2025 was reducing my caffeine intake after having one too many sleepless nights. The problem was that all the decaffeinated coffee I’d tried was terrible.</p>



<p class="">Or was it?</p>



<p class="">After all, I once called myself an audiophile until a <a href="https://www.npr.org/sections/therecord/2015/06/02/411473508/how-well-can-you-hear-audio-quality" rel="nofollow" target="_blank">series of A/B tests</a> suggested I couldn’t tell the difference between tracks encoded at different bitrates. So, it was entirely possible that I’d been brainwashed by <em>big coffee </em>to believe decaf coffee was inferior.</p>



<p class="">But, how exactly could we test this out?</p>



<p class="">The most obvious solution was to run a cross-country randomized double-blind experiment. This way, I wouldn’t automatically base my resentment on the caffeinated status of the coffee <em>and</em> focus on my subjective rating of the quality of each coffee.</p>



<p class="">Which is pretty much what I did:</p>



<h2 class="wp-block-heading">Step 1: Sample selection</h2>



<p class="">The first step was to select a wide enough sample of coffee beans to make the study as sciency as possible. Roping in my wife to help out, I purchased as many decaf varieties I could get our hands on.</p>



<h2 class="wp-block-heading">Step 2: Sample blinding</h2>



<p class="">After selecting a large representative sample of coffees (n=6), I packed a sample of each in its own container (pictured). To obscure each coffee’s identity I assigned them a number from 1 to 6. To further enhance the science I then had my wife assign new numbers so neither of us knew the origin of each sample.</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" loading="lazy" src="https://i1.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/clipboard-1457285144.jpeg?w=450&#038;ssl=1" alt="" class="wp-image-4179" style="width:405px;height:auto" srcset_temp="https://i1.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/clipboard-1457285144.jpeg?w=450&#038;ssl=1 675w, https://www.gilesd-j.com/wp-content/uploads/2026/03/clipboard-1457285144-300x267.jpeg 300w" sizes="auto, (max-width: 675px) 100vw, 675px" data-recalc-dims="1" /></figure>



<p class=""> </p>



<h2 class="wp-block-heading">Step 3: Testing</h2>



<p class="">Before starting the test I cleaned and descaled the coffee machine. Beans from each container were freshly ground at room temperature and used to make six separate espressos. Shots were drawn on a quasi-random basis according to whatever my wife handed to me. We then took a sip of each coffee and ranked our preferences from one to six.</p>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="407" height="321" loading="lazy" src="https://i2.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/clipboard-759299450.jpeg?resize=407%2C321&#038;ssl=1" alt="" class="wp-image-4181" srcset_temp="https://i2.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/clipboard-759299450.jpeg?resize=407%2C321&#038;ssl=1 407w, https://www.gilesd-j.com/wp-content/uploads/2026/03/clipboard-759299450-300x237.jpeg 300w" sizes="auto, (max-width: 407px) 100vw, 407px" data-recalc-dims="1" /></figure>



<p class=""> </p>



<h2 class="wp-block-heading">Step 4: Results</h2>



<p class="">Although I’d have liked to pre-register my research, none of the top econometric journals I contacted expressed interest. However, my running assumption was that our preferences for a coffee were mainly psychological and had little to do with its caffeine content.</p>



<p class="">If this was true, I’d expect to see no relationship between our rankings. But, to my surprisethis didn’t appear to be the case. Instead, we both ranked the beans in a similar order:</p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" loading="lazy" src="https://i2.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/image.png?w=450&#038;ssl=1" alt="" class="wp-image-4183" style="width:633px;height:auto" srcset_temp="https://i2.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/image.png?w=450&#038;ssl=1 745w, https://www.gilesd-j.com/wp-content/uploads/2026/03/image-300x217.png 300w" sizes="auto, (max-width: 745px) 100vw, 745px" data-recalc-dims="1" /></figure>



<p class=""> </p>



<p class=""><strong>Code Snippet:</strong></p>



<pre>#load libraries and import data
library(tidyverse)
dta_coffee_science&lt;-read_csv(&quot;./Data/250216 blind coffee ratings.csv&quot;)

# Show linear association between samples by assigned label
#reverse axis so lower rankings are higher on the axis scale
plt_rankings_by_coffee_no &lt;- ggplot(data = dta_coffee_science,
                                  aes(y = ranking_person_b, x = ranking_person_a)) +
 geom_text(aes(label = blind_label_round_2), size = 3.5) +
 scale_y_reverse(name = &quot;Person A Ranking (1 = Best)&quot;) +
 scale_x_reverse(name = &quot;Person B Ranking (1 = Best)&quot;) + 
 labs(title = &quot;Coffee Sample Rankings: Person A vs Person B&quot;,
      subtitle = &quot;Double-blind taste test results of brewed coffee samples&quot;,
      caption = &quot;Note: Lower numbers indicate higher preference&quot;) +
 theme_classic()


plt_rankings_by_coffee_no</pre>



<p class="">Of course, we’re doing some real science here, so to check let’s apply Kendall’s Tau of ranks and the Spearman correlation test for a null-hypothesis that there is no statistical association between our rankings.</p>



<p class=""><strong>Code Snippet:</strong></p>



<pre>#kendall
cor.test(data=dta_coffee_science, 
         ~ ranking_person_a + ranking_person_b, method = &quot;kendall&quot;)

#spearman
cor.test(data=dta_coffee_science, 
         ~ ranking_person_a + ranking_person_b, method = &quot;spearman&quot;)</pre>



<p class="">With p-values from six to eight percent, this isn’t a ringing endorsement of the results, but having already written the blog I’m happy to adjust my definition of significant to conclude our preferences were similar to one another.</p>



<figure class="wp-block-image aligncenter size-full"><img loading="lazy" decoding="async" width="259" height="365" loading="lazy" src="https://i2.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/significant.png?resize=259%2C365&#038;ssl=1" alt="" class="wp-image-4185" srcset_temp="https://i2.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/significant.png?resize=259%2C365&#038;ssl=1 259w, https://www.gilesd-j.com/wp-content/uploads/2026/03/significant-213x300.png 213w" sizes="auto, (max-width: 259px) 100vw, 259px" data-recalc-dims="1" /><figcaption class="wp-element-caption"><a href="https://xkcd.com/1478/" rel="nofollow" target="_blank">Source</a></figcaption></figure>



<p class="">Of course, my willingness to play fast and lose with the stats also stems from knowing a key result: <em>we both ranked the store-bought caffeinated beans highest.</em></p>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" loading="lazy" src="https://i1.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/image-1.png?w=450&#038;ssl=1" alt="" class="wp-image-4187" style="width:605px;height:auto" srcset_temp="https://i1.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/image-1.png?w=450&#038;ssl=1 745w, https://www.gilesd-j.com/wp-content/uploads/2026/03/image-1-300x217.png 300w" sizes="auto, (max-width: 745px) 100vw, 745px" data-recalc-dims="1" /></figure>



<p class=""> </p>



<p class=""><strong>Code Snippet: </strong></p>



<pre># Show linear association between samples by assigned label and caffeination status 
plt_rankings_by_caffeine &lt;-ggplot(data=dta_coffee_science,
                     aes(y=ranking_person_b, x=ranking_person_a,col=decaf))+
 geom_text(aes(label = blind_label_round_2), size = 3.5) +
  scale_y_reverse(limits = c(7.5, 0.5)) +   
  scale_x_reverse(limits = c(7.5, 0.5)) +  
    coord_cartesian(clip = &quot;off&quot;) +  
 labs(title = &quot;Coffee Sample Rankings: Person A vs Person B&quot;,
      subtitle = &quot;Double-blind taste test results of brewed coffee samples&quot;,
      caption = &quot;Note: Lower numbers indicate higher preference&quot;,
      x=&quot;Person A Ranking (1 = Best)&quot;, 
      y=&quot;Person B Ranking (1 = Best)&quot;) +
 theme_classic()+
 scale_color_manual(values = c(&quot;black&quot;, &quot;blue&quot;), name = &quot;Decaf:&quot;) 

plt_rankings_by_caffeine</pre>



<p class="">I also found it surprising that the beans from a specialized provider of decaf weren’t necessarily ranked higher, with only one of their beans ranked in the top three:</p>



<p class=""><strong>Code Snippet:</strong></p>



<pre># Show linear association between samples by assigned label and caffeination status with original labels
plt_rankings_by_caffeine_named &lt;- ggplot(data = dta_coffee_science,
                     aes(y = ranking_person_b, x = ranking_person_a, col = decaf)) +
  geom_text(aes(label = str_wrap(paste0(coffee_brand, &quot;: &quot;, coffee_name), width = 15)), 
            size = 3.5, lineheight = 0.8) +
  scale_y_reverse(limits = c(7.5, 0.5)) +   
  scale_x_reverse(limits = c(7.5, 0.5)) +   
  coord_cartesian(clip = &quot;off&quot;) +            # stop clipping text at panel border
  labs(title = &quot;Coffee Sample Rankings: Person A vs Person B&quot;,
       subtitle = &quot;Double-blind taste test results of brewed coffee samples&quot;,
       caption = &quot;Note: Lower numbers indicate higher preference&quot;,
       x = &quot;Person A Ranking (1 = Best)&quot;, 
       y = &quot;Person B Ranking (1 = Best)&quot;) +
  theme_classic() +
  theme(plot.margin = margin(10, 60, 10, 60)) +  
  scale_color_manual(values = c(&quot;black&quot;, &quot;blue&quot;), name = &quot;Decaffeinated&quot;)

plt_rankings_by_caffeine_named</pre>



<figure class="wp-block-image aligncenter size-full is-resized"><img loading="lazy" decoding="async" loading="lazy" src="https://i0.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/image-2.png?w=450&#038;ssl=1" alt="" class="wp-image-4189" style="width:633px;height:auto" srcset_temp="https://i0.wp.com/www.gilesd-j.com/wp-content/uploads/2026/03/image-2.png?w=450&#038;ssl=1 755w, https://www.gilesd-j.com/wp-content/uploads/2026/03/image-2-300x234.png 300w" sizes="auto, (max-width: 755px) 100vw, 755px" data-recalc-dims="1" /></figure>



<p class=""> </p>



<p class="">And while <a href="https://www.gourmettraveller.com.au/dining-out/food-news/does-decaf-coffee-taste-as-good-as-regular-coffee-2733/" rel="nofollow" target="_blank">these coffee nerds</a> might disagree, the results suggest we can tell the difference between coffees and both prefer the caffeinated alternative.</p>



<p class="">When I recounted the result to a food chemist they told me this probably has something to do with decaf coffee lacking <a href="https://journals.sagepub.com/doi/10.1177/0003489420906187" rel="nofollow" target="_blank">the bitterness of caffeine</a>. </p>



<p class="">When I recounted the results to my wife, she told me to never waste her time like this again. I probably will. </p>



<p class="">In the spirit of open science, you can download the dataset <a href="https://www.gilesd-j.com/shared_resources/blogs/260323%20Coffee/250216%20blind%20coffee%20ratings.csv" rel="nofollow" target="_blank">here</a>.</p>



<p class=""></p>
<p>The post <a href="https://www.gilesd-j.com/2026/03/23/using-science-to-find-the-best-decaf/" rel="nofollow" target="_blank">Using science to find the best decaf</a> appeared first on <a href="https://www.gilesd-j.com/" rel="nofollow" target="_blank">Giles</a>.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.gilesd-j.com/2026/03/23/using-science-to-find-the-best-decaf/"> Data Analytics and AI Archives - Giles</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/using-science-to-find-the-best-decaf/">Using science to find the best decaf</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400045</post-id>	</item>
		<item>
		<title>odds_summary: Turning Probabilistic Estimates into Clear, Decision-Ready Insights</title>
		<link>https://www.r-bloggers.com/2026/03/odds_summary-turning-probabilistic-estimates-into-clear-decision-ready-insights/</link>
		
		<dc:creator><![CDATA[R-Blog on Data modelling to develop ...]]></dc:creator>
		<pubDate>Mon, 23 Mar 2026 01:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">/r-blog/2026-03-23-r-markdown/oddsratio/</guid>

					<description><![CDATA[<p>Introduction<br />
Model tuning and estimation has evolved from simple extrapolation to sophisticated probabilistic modeling frameworks. In contemporary data science, decision-makers require more than estimates, the need for clear statements about likeliho...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/odds_summary-turning-probabilistic-estimates-into-clear-decision-ready-insights/">odds_summary: Turning Probabilistic Estimates into Clear, Decision-Ready Insights</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://www.jobnmadu.com/r-blog/2026-03-23-r-markdown/oddsratio/"> R-Blog on Data modelling to develop ...</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>

<div id="introduction" class="section level1">
<h1>Introduction</h1>
<p>Model tuning and estimation has evolved from simple extrapolation to sophisticated probabilistic modeling frameworks. In contemporary data science, decision-makers require more than estimates, the need for clear statements about likelihood, risk, and uncertainty are critical to correct decision-making. However, despite the advances in predictive modeling, there is a persistent limitation of making quick and robust decisions from the estimated probabilities. The estimates are often not meaningfully interpreted. This is where the odds_summary function becomes strategically important. It converts probabilistic outputs into structured summaries that directly support:</p>
<blockquote>
<p>• decision-making</p>
</blockquote>
<blockquote>
<p>• risk communication</p>
</blockquote>
<blockquote>
<p>• model validation</p>
</blockquote>
<blockquote>
<p>• reproducible research</p>
</blockquote>
<p>In practical terms, it turns numbers into evidence.</p>
<p>The function is implemented as follows:</p>
<blockquote>
<p>odds_summary(model)</p>
</blockquote>
<p>where:</p>
<p><code>model</code> An R object of estimates from models covered. For now only glm, multimon and polr models are covered.</p>
<div id="implement-the-function" class="section level2">
<h2>Implement the function</h2>
<pre>library(Dyn4cast)
library(tidyverse)</pre>
<div id="ordered-logistic-model" class="section level3">
<h3>Ordered Logistic Model</h3>
<pre>library(MASS)
options(contrasts = c(&quot;contr.treatment&quot;, &quot;contr.poly&quot;))
house.plr &lt;- polr(Sat ~ Infl + Type + Cont, weights = Freq, data = housing)
modelsummary::datasummary_df(odds_summary(house.plr))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_0y93rh7rqqwcli5evcjb = TinyTable.createTableFunctions("tinytable_0y93rh7rqqwcli5evcjb");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '8', j: 1 }, { i: '8', j: 2 }, { i: '8', j: 3 }, { i: '8', j: 4 }, { i: '8', j: 5 }, { i: '8', j: 6 }, { i: '8', j: 7 }, { i: '8', j: 8 }, { i: '8', j: 9 }, { i: '8', j: 10 }, { i: '8', j: 11 } ], css_id: 'tinytable_css_kqdivy1ouvxmjyvd8rvd',},
{ positions: [ { i: '1', j: 1 }, { i: '2', j: 1 }, { i: '3', j: 1 }, { i: '4', j: 1 }, { i: '5', j: 1 }, { i: '6', j: 1 }, { i: '7', j: 1 }, { i: '1', j: 2 }, { i: '2', j: 2 }, { i: '3', j: 2 }, { i: '4', j: 2 }, { i: '5', j: 2 }, { i: '6', j: 2 }, { i: '7', j: 2 }, { i: '1', j: 3 }, { i: '2', j: 3 }, { i: '3', j: 3 }, { i: '4', j: 3 }, { i: '5', j: 3 }, { i: '6', j: 3 }, { i: '7', j: 3 }, { i: '1', j: 4 }, { i: '2', j: 4 }, { i: '3', j: 4 }, { i: '4', j: 4 }, { i: '5', j: 4 }, { i: '6', j: 4 }, { i: '7', j: 4 }, { i: '1', j: 5 }, { i: '2', j: 5 }, { i: '3', j: 5 }, { i: '4', j: 5 }, { i: '5', j: 5 }, { i: '6', j: 5 }, { i: '7', j: 5 }, { i: '1', j: 6 }, { i: '2', j: 6 }, { i: '3', j: 6 }, { i: '4', j: 6 }, { i: '5', j: 6 }, { i: '6', j: 6 }, { i: '7', j: 6 }, { i: '1', j: 7 }, { i: '2', j: 7 }, { i: '3', j: 7 }, { i: '4', j: 7 }, { i: '5', j: 7 }, { i: '6', j: 7 }, { i: '7', j: 7 }, { i: '1', j: 8 }, { i: '2', j: 8 }, { i: '3', j: 8 }, { i: '4', j: 8 }, { i: '5', j: 8 }, { i: '6', j: 8 }, { i: '7', j: 8 }, { i: '1', j: 9 }, { i: '2', j: 9 }, { i: '3', j: 9 }, { i: '4', j: 9 }, { i: '5', j: 9 }, { i: '6', j: 9 }, { i: '7', j: 9 }, { i: '1', j: 10 }, { i: '2', j: 10 }, { i: '3', j: 10 }, { i: '4', j: 10 }, { i: '5', j: 10 }, { i: '6', j: 10 }, { i: '7', j: 10 }, { i: '1', j: 11 }, { i: '2', j: 11 }, { i: '3', j: 11 }, { i: '4', j: 11 }, { i: '5', j: 11 }, { i: '6', j: 11 }, { i: '7', j: 11 } ], css_id: 'tinytable_css_0n5siff8v55citkwicfb',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_wkqke0b7agbredbwastx',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_0y93rh7rqqwcli5evcjb.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_0y93rh7rqqwcli5evcjb td.tinytable_css_kqdivy1ouvxmjyvd8rvd, #tinytable_0y93rh7rqqwcli5evcjb th.tinytable_css_kqdivy1ouvxmjyvd8rvd { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_0y93rh7rqqwcli5evcjb td.tinytable_css_0n5siff8v55citkwicfb, #tinytable_0y93rh7rqqwcli5evcjb th.tinytable_css_0n5siff8v55citkwicfb { text-align: left }
#tinytable_0y93rh7rqqwcli5evcjb td.tinytable_css_wkqke0b7agbredbwastx, #tinytable_0y93rh7rqqwcli5evcjb th.tinytable_css_wkqke0b7agbredbwastx { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_0y93rh7rqqwcli5evcjb" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">InflMedium</td>
<td data-row="1" data-col="2">0.57</td>
<td data-row="1" data-col="3">0.10</td>
<td data-row="1" data-col="4">5.41</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">0.566***</td>
<td data-row="1" data-col="7">1.76</td>
<td data-row="1" data-col="8">76.19</td>
<td data-row="1" data-col="9">1.762***</td>
<td data-row="1" data-col="10">1.44</td>
<td data-row="1" data-col="11">2.16</td>
</tr>
<tr>
<td data-row="2" data-col="1">InflHigh</td>
<td data-row="2" data-col="2">1.29</td>
<td data-row="2" data-col="3">0.13</td>
<td data-row="2" data-col="4">10.14</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">1.289***</td>
<td data-row="2" data-col="7">3.63</td>
<td data-row="2" data-col="8">262.85</td>
<td data-row="2" data-col="9">3.628***</td>
<td data-row="2" data-col="10">2.83</td>
<td data-row="2" data-col="11">4.66</td>
</tr>
<tr>
<td data-row="3" data-col="1">TypeApartment</td>
<td data-row="3" data-col="2">-0.57</td>
<td data-row="3" data-col="3">0.12</td>
<td data-row="3" data-col="4">-4.80</td>
<td data-row="3" data-col="5">0.00</td>
<td data-row="3" data-col="6">-0.572***</td>
<td data-row="3" data-col="7">0.56</td>
<td data-row="3" data-col="8">-43.58</td>
<td data-row="3" data-col="9">0.564***</td>
<td data-row="3" data-col="10">0.45</td>
<td data-row="3" data-col="11">0.71</td>
</tr>
<tr>
<td data-row="4" data-col="1">TypeAtrium</td>
<td data-row="4" data-col="2">-0.37</td>
<td data-row="4" data-col="3">0.16</td>
<td data-row="4" data-col="4">-2.36</td>
<td data-row="4" data-col="5">0.02</td>
<td data-row="4" data-col="6">-0.366*</td>
<td data-row="4" data-col="7">0.69</td>
<td data-row="4" data-col="8">-30.66</td>
<td data-row="4" data-col="9">0.693*</td>
<td data-row="4" data-col="10">0.51</td>
<td data-row="4" data-col="11">0.94</td>
</tr>
<tr>
<td data-row="5" data-col="1">TypeTerrace</td>
<td data-row="5" data-col="2">-1.09</td>
<td data-row="5" data-col="3">0.15</td>
<td data-row="5" data-col="4">-7.20</td>
<td data-row="5" data-col="5">0.00</td>
<td data-row="5" data-col="6">-1.091***</td>
<td data-row="5" data-col="7">0.34</td>
<td data-row="5" data-col="8">-66.41</td>
<td data-row="5" data-col="9">0.336***</td>
<td data-row="5" data-col="10">0.25</td>
<td data-row="5" data-col="11">0.45</td>
</tr>
<tr>
<td data-row="6" data-col="1">ContHigh</td>
<td data-row="6" data-col="2">0.36</td>
<td data-row="6" data-col="3">0.10</td>
<td data-row="6" data-col="4">3.77</td>
<td data-row="6" data-col="5">0.00</td>
<td data-row="6" data-col="6">0.36***</td>
<td data-row="6" data-col="7">1.43</td>
<td data-row="6" data-col="8">43.37</td>
<td data-row="6" data-col="9">1.434***</td>
<td data-row="6" data-col="10">1.19</td>
<td data-row="6" data-col="11">1.73</td>
</tr>
<tr>
<td data-row="7" data-col="1">Low|Medium</td>
<td data-row="7" data-col="2">-0.50</td>
<td data-row="7" data-col="3">0.12</td>
<td data-row="7" data-col="4">-3.97</td>
<td data-row="7" data-col="5">0.00</td>
<td data-row="7" data-col="6">-0.496***</td>
<td data-row="7" data-col="7">0.61</td>
<td data-row="7" data-col="8">-39.11</td>
<td data-row="7" data-col="9">0.609***</td>
<td data-row="7" data-col="10">-0.74</td>
<td data-row="7" data-col="11">-0.25</td>
</tr>
<tr>
<td data-row="8" data-col="1">Medium|High</td>
<td data-row="8" data-col="2">0.69</td>
<td data-row="8" data-col="3">0.13</td>
<td data-row="8" data-col="4">5.50</td>
<td data-row="8" data-col="5">0.00</td>
<td data-row="8" data-col="6">0.691***</td>
<td data-row="8" data-col="7">2.00</td>
<td data-row="8" data-col="8">99.51</td>
<td data-row="8" data-col="9">1.995***</td>
<td data-row="8" data-col="10">0.44</td>
<td data-row="8" data-col="11">0.94</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
</div>
</div>
<div id="glm-models" class="section level2">
<h2>glm models</h2>
<pre>counts &lt;- c(18, 17, 15, 20, 10, 20, 25, 13, 12)
outcome &lt;- gl(3, 1, 9)
treatment &lt;- gl(3, 3)
ddc &lt;- data.frame(treatment, outcome, counts) # showing data
glm.D93 &lt;- glm(counts ~ ., data = ddc, family = poisson())
modelsummary::datasummary_df(odds_summary(glm.D93))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_12d3s66vdraiwt31oplu = TinyTable.createTableFunctions("tinytable_12d3s66vdraiwt31oplu");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '5', j: 1 }, { i: '5', j: 2 }, { i: '5', j: 3 }, { i: '5', j: 4 }, { i: '5', j: 5 }, { i: '5', j: 6 }, { i: '5', j: 7 }, { i: '5', j: 8 }, { i: '5', j: 9 }, { i: '5', j: 10 }, { i: '5', j: 11 } ], css_id: 'tinytable_css_daa6amfft7ltkyt107ix',},
{ positions: [ { i: '1', j: 1 }, { i: '2', j: 1 }, { i: '3', j: 1 }, { i: '4', j: 1 }, { i: '1', j: 2 }, { i: '2', j: 2 }, { i: '3', j: 2 }, { i: '4', j: 2 }, { i: '1', j: 3 }, { i: '2', j: 3 }, { i: '3', j: 3 }, { i: '4', j: 3 }, { i: '1', j: 4 }, { i: '2', j: 4 }, { i: '3', j: 4 }, { i: '4', j: 4 }, { i: '1', j: 5 }, { i: '2', j: 5 }, { i: '3', j: 5 }, { i: '4', j: 5 }, { i: '1', j: 6 }, { i: '2', j: 6 }, { i: '3', j: 6 }, { i: '4', j: 6 }, { i: '1', j: 7 }, { i: '2', j: 7 }, { i: '3', j: 7 }, { i: '4', j: 7 }, { i: '1', j: 8 }, { i: '2', j: 8 }, { i: '3', j: 8 }, { i: '4', j: 8 }, { i: '1', j: 9 }, { i: '2', j: 9 }, { i: '3', j: 9 }, { i: '4', j: 9 }, { i: '1', j: 10 }, { i: '2', j: 10 }, { i: '3', j: 10 }, { i: '4', j: 10 }, { i: '1', j: 11 }, { i: '2', j: 11 }, { i: '3', j: 11 }, { i: '4', j: 11 } ], css_id: 'tinytable_css_kom2rjrasnbp39sx7eae',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_6xdoabik2wy8rhcbyouv',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_12d3s66vdraiwt31oplu.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_12d3s66vdraiwt31oplu td.tinytable_css_daa6amfft7ltkyt107ix, #tinytable_12d3s66vdraiwt31oplu th.tinytable_css_daa6amfft7ltkyt107ix { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_12d3s66vdraiwt31oplu td.tinytable_css_kom2rjrasnbp39sx7eae, #tinytable_12d3s66vdraiwt31oplu th.tinytable_css_kom2rjrasnbp39sx7eae { text-align: left }
#tinytable_12d3s66vdraiwt31oplu td.tinytable_css_6xdoabik2wy8rhcbyouv, #tinytable_12d3s66vdraiwt31oplu th.tinytable_css_6xdoabik2wy8rhcbyouv { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_12d3s66vdraiwt31oplu" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept)</td>
<td data-row="1" data-col="2">3.04</td>
<td data-row="1" data-col="3">0.17</td>
<td data-row="1" data-col="4">17.81</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">3.045***</td>
<td data-row="1" data-col="7">21.00</td>
<td data-row="1" data-col="8">2000.00</td>
<td data-row="1" data-col="9">21***</td>
<td data-row="1" data-col="10">14.82</td>
<td data-row="1" data-col="11">28.98</td>
</tr>
<tr>
<td data-row="2" data-col="1">treatment2</td>
<td data-row="2" data-col="2">0.00</td>
<td data-row="2" data-col="3">0.20</td>
<td data-row="2" data-col="4">0.00</td>
<td data-row="2" data-col="5">1.00</td>
<td data-row="2" data-col="6">0</td>
<td data-row="2" data-col="7">1.00</td>
<td data-row="2" data-col="8">0.00</td>
<td data-row="2" data-col="9">1</td>
<td data-row="2" data-col="10">0.67</td>
<td data-row="2" data-col="11">1.48</td>
</tr>
<tr>
<td data-row="3" data-col="1">treatment3</td>
<td data-row="3" data-col="2">0.00</td>
<td data-row="3" data-col="3">0.20</td>
<td data-row="3" data-col="4">0.00</td>
<td data-row="3" data-col="5">1.00</td>
<td data-row="3" data-col="6">0</td>
<td data-row="3" data-col="7">1.00</td>
<td data-row="3" data-col="8">0.00</td>
<td data-row="3" data-col="9">1</td>
<td data-row="3" data-col="10">0.67</td>
<td data-row="3" data-col="11">1.48</td>
</tr>
<tr>
<td data-row="4" data-col="1">outcome2</td>
<td data-row="4" data-col="2">-0.45</td>
<td data-row="4" data-col="3">0.20</td>
<td data-row="4" data-col="4">-2.25</td>
<td data-row="4" data-col="5">0.02</td>
<td data-row="4" data-col="6">-0.454*</td>
<td data-row="4" data-col="7">0.63</td>
<td data-row="4" data-col="8">-36.51</td>
<td data-row="4" data-col="9">0.635*</td>
<td data-row="4" data-col="10">0.42</td>
<td data-row="4" data-col="11">0.94</td>
</tr>
<tr>
<td data-row="5" data-col="1">outcome3</td>
<td data-row="5" data-col="2">-0.29</td>
<td data-row="5" data-col="3">0.19</td>
<td data-row="5" data-col="4">-1.52</td>
<td data-row="5" data-col="5">0.13</td>
<td data-row="5" data-col="6">-0.293</td>
<td data-row="5" data-col="7">0.75</td>
<td data-row="5" data-col="8">-25.40</td>
<td data-row="5" data-col="9">0.746</td>
<td data-row="5" data-col="10">0.51</td>
<td data-row="5" data-col="11">1.09</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
<pre>anorex.1 &lt;- glm(Postwt ~ Prewt + Treat + offset(Prewt),
family = gaussian, data = anorexia
)
modelsummary::datasummary_df(odds_summary(anorex.1))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_awtmeapdm732006j9m4m = TinyTable.createTableFunctions("tinytable_awtmeapdm732006j9m4m");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '4', j: 1 }, { i: '4', j: 2 }, { i: '4', j: 3 }, { i: '4', j: 4 }, { i: '4', j: 5 }, { i: '4', j: 6 }, { i: '4', j: 7 }, { i: '4', j: 8 }, { i: '4', j: 9 }, { i: '4', j: 10 }, { i: '4', j: 11 } ], css_id: 'tinytable_css_xeicyxiqkgxks4x5ayeh',},
{ positions: [ { i: '1', j: 1 }, { i: '2', j: 1 }, { i: '3', j: 1 }, { i: '1', j: 2 }, { i: '2', j: 2 }, { i: '3', j: 2 }, { i: '1', j: 3 }, { i: '2', j: 3 }, { i: '3', j: 3 }, { i: '1', j: 4 }, { i: '2', j: 4 }, { i: '3', j: 4 }, { i: '1', j: 5 }, { i: '2', j: 5 }, { i: '3', j: 5 }, { i: '1', j: 6 }, { i: '2', j: 6 }, { i: '3', j: 6 }, { i: '1', j: 7 }, { i: '2', j: 7 }, { i: '3', j: 7 }, { i: '1', j: 8 }, { i: '2', j: 8 }, { i: '3', j: 8 }, { i: '1', j: 9 }, { i: '2', j: 9 }, { i: '3', j: 9 }, { i: '1', j: 10 }, { i: '2', j: 10 }, { i: '3', j: 10 }, { i: '1', j: 11 }, { i: '2', j: 11 }, { i: '3', j: 11 } ], css_id: 'tinytable_css_uuwvfoveya2y8hjsxswq',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_8ceydmhnrxf6zqyzc4ky',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_awtmeapdm732006j9m4m.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_awtmeapdm732006j9m4m td.tinytable_css_xeicyxiqkgxks4x5ayeh, #tinytable_awtmeapdm732006j9m4m th.tinytable_css_xeicyxiqkgxks4x5ayeh { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_awtmeapdm732006j9m4m td.tinytable_css_uuwvfoveya2y8hjsxswq, #tinytable_awtmeapdm732006j9m4m th.tinytable_css_uuwvfoveya2y8hjsxswq { text-align: left }
#tinytable_awtmeapdm732006j9m4m td.tinytable_css_8ceydmhnrxf6zqyzc4ky, #tinytable_awtmeapdm732006j9m4m th.tinytable_css_8ceydmhnrxf6zqyzc4ky { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_awtmeapdm732006j9m4m" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept)</td>
<td data-row="1" data-col="2">49.77</td>
<td data-row="1" data-col="3">13.39</td>
<td data-row="1" data-col="4">3.72</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">49.771***</td>
<td data-row="1" data-col="7">4.123994e+21</td>
<td data-row="1" data-col="8">4.123994e+23</td>
<td data-row="1" data-col="9">4.12399379732274e+21***</td>
<td data-row="1" data-col="10">1.647835e+10</td>
<td data-row="1" data-col="11">1.032101e+33</td>
</tr>
<tr>
<td data-row="2" data-col="1">Prewt</td>
<td data-row="2" data-col="2">-0.57</td>
<td data-row="2" data-col="3">0.16</td>
<td data-row="2" data-col="4">-3.51</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">-0.566***</td>
<td data-row="2" data-col="7">5.700000e-01</td>
<td data-row="2" data-col="8">-4.319000e+01</td>
<td data-row="2" data-col="9">0.568***</td>
<td data-row="2" data-col="10">4.100000e-01</td>
<td data-row="2" data-col="11">7.800000e-01</td>
</tr>
<tr>
<td data-row="3" data-col="1">TreatCont</td>
<td data-row="3" data-col="2">-4.10</td>
<td data-row="3" data-col="3">1.89</td>
<td data-row="3" data-col="4">-2.16</td>
<td data-row="3" data-col="5">0.03</td>
<td data-row="3" data-col="6">-4.097*</td>
<td data-row="3" data-col="7">2.000000e-02</td>
<td data-row="3" data-col="8">-9.834000e+01</td>
<td data-row="3" data-col="9">0.017*</td>
<td data-row="3" data-col="10">0.000000e+00</td>
<td data-row="3" data-col="11">6.800000e-01</td>
</tr>
<tr>
<td data-row="4" data-col="1">TreatFT</td>
<td data-row="4" data-col="2">4.56</td>
<td data-row="4" data-col="3">2.13</td>
<td data-row="4" data-col="4">2.14</td>
<td data-row="4" data-col="5">0.04</td>
<td data-row="4" data-col="6">4.563*</td>
<td data-row="4" data-col="7">9.588000e+01</td>
<td data-row="4" data-col="8">9.487670e+03</td>
<td data-row="4" data-col="9">95.877*</td>
<td data-row="4" data-col="10">1.460000e+00</td>
<td data-row="4" data-col="11">6.274970e+03</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
<pre>clotting &lt;- data.frame(
u = c(5, 10, 15, 20, 30, 40, 60, 80, 100),
lot1 = c(118, 58, 42, 35, 27, 25, 21, 19, 18),
lot2 = c(69, 35, 26, 21, 18, 16, 13, 12, 12)
)
lot1 &lt;- glm(lot1 ~ log(u), data = clotting, family = Gamma)
modelsummary::datasummary_df(odds_summary(lot1))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_11ilqce3cgtx6ja24i91 = TinyTable.createTableFunctions("tinytable_11ilqce3cgtx6ja24i91");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '2', j: 1 }, { i: '2', j: 2 }, { i: '2', j: 3 }, { i: '2', j: 4 }, { i: '2', j: 5 }, { i: '2', j: 6 }, { i: '2', j: 7 }, { i: '2', j: 8 }, { i: '2', j: 9 }, { i: '2', j: 10 }, { i: '2', j: 11 } ], css_id: 'tinytable_css_ctvnze1pbzc7albeobq0',},
{ positions: [ { i: '1', j: 1 }, { i: '1', j: 2 }, { i: '1', j: 3 }, { i: '1', j: 4 }, { i: '1', j: 5 }, { i: '1', j: 6 }, { i: '1', j: 7 }, { i: '1', j: 8 }, { i: '1', j: 9 }, { i: '1', j: 10 }, { i: '1', j: 11 } ], css_id: 'tinytable_css_22tlklkjov7mxa9qllya',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_hkzvjfd73u5fjm8lxmct',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_11ilqce3cgtx6ja24i91.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_11ilqce3cgtx6ja24i91 td.tinytable_css_ctvnze1pbzc7albeobq0, #tinytable_11ilqce3cgtx6ja24i91 th.tinytable_css_ctvnze1pbzc7albeobq0 { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_11ilqce3cgtx6ja24i91 td.tinytable_css_22tlklkjov7mxa9qllya, #tinytable_11ilqce3cgtx6ja24i91 th.tinytable_css_22tlklkjov7mxa9qllya { text-align: left }
#tinytable_11ilqce3cgtx6ja24i91 td.tinytable_css_hkzvjfd73u5fjm8lxmct, #tinytable_11ilqce3cgtx6ja24i91 th.tinytable_css_hkzvjfd73u5fjm8lxmct { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_11ilqce3cgtx6ja24i91" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept)</td>
<td data-row="1" data-col="2">-0.02</td>
<td data-row="1" data-col="3">0.00</td>
<td data-row="1" data-col="4">-17.85</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">-0.017***</td>
<td data-row="1" data-col="7">0.98</td>
<td data-row="1" data-col="8">-1.64</td>
<td data-row="1" data-col="9">0.984***</td>
<td data-row="1" data-col="10">0.98</td>
<td data-row="1" data-col="11">0.99</td>
</tr>
<tr>
<td data-row="2" data-col="1">log(u)</td>
<td data-row="2" data-col="2">0.02</td>
<td data-row="2" data-col="3">0.00</td>
<td data-row="2" data-col="4">36.97</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">0.015***</td>
<td data-row="2" data-col="7">1.02</td>
<td data-row="2" data-col="8">1.55</td>
<td data-row="2" data-col="9">1.015***</td>
<td data-row="2" data-col="10">1.01</td>
<td data-row="2" data-col="11">1.02</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
<pre>lot2 &lt;- glm(lot2 ~ log(u), data = clotting, family = Gamma)
modelsummary::datasummary_df(odds_summary(lot2))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_a25ew0ljz2ydb62j7g81 = TinyTable.createTableFunctions("tinytable_a25ew0ljz2ydb62j7g81");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '2', j: 1 }, { i: '2', j: 2 }, { i: '2', j: 3 }, { i: '2', j: 4 }, { i: '2', j: 5 }, { i: '2', j: 6 }, { i: '2', j: 7 }, { i: '2', j: 8 }, { i: '2', j: 9 }, { i: '2', j: 10 }, { i: '2', j: 11 } ], css_id: 'tinytable_css_2jnv1n9gua6mhl6wu7to',},
{ positions: [ { i: '1', j: 1 }, { i: '1', j: 2 }, { i: '1', j: 3 }, { i: '1', j: 4 }, { i: '1', j: 5 }, { i: '1', j: 6 }, { i: '1', j: 7 }, { i: '1', j: 8 }, { i: '1', j: 9 }, { i: '1', j: 10 }, { i: '1', j: 11 } ], css_id: 'tinytable_css_tmwjxxt1dvryppey3b43',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_4efkqa67vfag1dvw3cjg',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_a25ew0ljz2ydb62j7g81.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_a25ew0ljz2ydb62j7g81 td.tinytable_css_2jnv1n9gua6mhl6wu7to, #tinytable_a25ew0ljz2ydb62j7g81 th.tinytable_css_2jnv1n9gua6mhl6wu7to { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_a25ew0ljz2ydb62j7g81 td.tinytable_css_tmwjxxt1dvryppey3b43, #tinytable_a25ew0ljz2ydb62j7g81 th.tinytable_css_tmwjxxt1dvryppey3b43 { text-align: left }
#tinytable_a25ew0ljz2ydb62j7g81 td.tinytable_css_4efkqa67vfag1dvw3cjg, #tinytable_a25ew0ljz2ydb62j7g81 th.tinytable_css_4efkqa67vfag1dvw3cjg { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_a25ew0ljz2ydb62j7g81" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept)</td>
<td data-row="1" data-col="2">-0.02</td>
<td data-row="1" data-col="3">0.00</td>
<td data-row="1" data-col="4">-18.02</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">-0.024***</td>
<td data-row="1" data-col="7">0.98</td>
<td data-row="1" data-col="8">-2.36</td>
<td data-row="1" data-col="9">0.976***</td>
<td data-row="1" data-col="10">0.97</td>
<td data-row="1" data-col="11">0.98</td>
</tr>
<tr>
<td data-row="2" data-col="1">log(u)</td>
<td data-row="2" data-col="2">0.02</td>
<td data-row="2" data-col="3">0.00</td>
<td data-row="2" data-col="4">40.92</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">0.024***</td>
<td data-row="2" data-col="7">1.02</td>
<td data-row="2" data-col="8">2.39</td>
<td data-row="2" data-col="9">1.024***</td>
<td data-row="2" data-col="10">1.02</td>
<td data-row="2" data-col="11">1.03</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
<pre>x &lt;- rnorm(100)
y &lt;- rpois(100, exp(1 + x))
lm2 &lt;- glm(y ~ x, family = quasi(variance = &quot;mu&quot;, link = &quot;log&quot;))
modelsummary::datasummary_df(odds_summary(lm2))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_re4al20x89j9zu51a5fl = TinyTable.createTableFunctions("tinytable_re4al20x89j9zu51a5fl");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '2', j: 1 }, { i: '2', j: 2 }, { i: '2', j: 3 }, { i: '2', j: 4 }, { i: '2', j: 5 }, { i: '2', j: 6 }, { i: '2', j: 7 }, { i: '2', j: 8 }, { i: '2', j: 9 }, { i: '2', j: 10 }, { i: '2', j: 11 } ], css_id: 'tinytable_css_4f5y8vo4p4tyoqcio0no',},
{ positions: [ { i: '1', j: 1 }, { i: '1', j: 2 }, { i: '1', j: 3 }, { i: '1', j: 4 }, { i: '1', j: 5 }, { i: '1', j: 6 }, { i: '1', j: 7 }, { i: '1', j: 8 }, { i: '1', j: 9 }, { i: '1', j: 10 }, { i: '1', j: 11 } ], css_id: 'tinytable_css_7nqc4nte1l0g8p0vwhoe',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_m8fzsl9k55oweue3w4f2',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_re4al20x89j9zu51a5fl.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_re4al20x89j9zu51a5fl td.tinytable_css_4f5y8vo4p4tyoqcio0no, #tinytable_re4al20x89j9zu51a5fl th.tinytable_css_4f5y8vo4p4tyoqcio0no { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_re4al20x89j9zu51a5fl td.tinytable_css_7nqc4nte1l0g8p0vwhoe, #tinytable_re4al20x89j9zu51a5fl th.tinytable_css_7nqc4nte1l0g8p0vwhoe { text-align: left }
#tinytable_re4al20x89j9zu51a5fl td.tinytable_css_m8fzsl9k55oweue3w4f2, #tinytable_re4al20x89j9zu51a5fl th.tinytable_css_m8fzsl9k55oweue3w4f2 { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_re4al20x89j9zu51a5fl" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept)</td>
<td data-row="1" data-col="2">0.92</td>
<td data-row="1" data-col="3">0.06</td>
<td data-row="1" data-col="4">14.45</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">0.915***</td>
<td data-row="1" data-col="7">2.50</td>
<td data-row="1" data-col="8">149.71</td>
<td data-row="1" data-col="9">2.497***</td>
<td data-row="1" data-col="10">2.20</td>
<td data-row="1" data-col="11">2.82</td>
</tr>
<tr>
<td data-row="2" data-col="1">x</td>
<td data-row="2" data-col="2">1.05</td>
<td data-row="2" data-col="3">0.04</td>
<td data-row="2" data-col="4">29.75</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">1.053***</td>
<td data-row="2" data-col="7">2.87</td>
<td data-row="2" data-col="8">186.74</td>
<td data-row="2" data-col="9">2.867***</td>
<td data-row="2" data-col="10">2.67</td>
<td data-row="2" data-col="11">3.07</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
<pre>lm3 &lt;- glm(y ~ x, family = poisson)
modelsummary::datasummary_df(odds_summary(lm3))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_7lcbhbnn06e1wkeku720 = TinyTable.createTableFunctions("tinytable_7lcbhbnn06e1wkeku720");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '2', j: 1 }, { i: '2', j: 2 }, { i: '2', j: 3 }, { i: '2', j: 4 }, { i: '2', j: 5 }, { i: '2', j: 6 }, { i: '2', j: 7 }, { i: '2', j: 8 }, { i: '2', j: 9 }, { i: '2', j: 10 }, { i: '2', j: 11 } ], css_id: 'tinytable_css_lsaliww65vddk863jva0',},
{ positions: [ { i: '1', j: 1 }, { i: '1', j: 2 }, { i: '1', j: 3 }, { i: '1', j: 4 }, { i: '1', j: 5 }, { i: '1', j: 6 }, { i: '1', j: 7 }, { i: '1', j: 8 }, { i: '1', j: 9 }, { i: '1', j: 10 }, { i: '1', j: 11 } ], css_id: 'tinytable_css_k39fgspo5jsxoplrce6t',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_3eodsi0kib610k7w4q15',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_7lcbhbnn06e1wkeku720.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_7lcbhbnn06e1wkeku720 td.tinytable_css_lsaliww65vddk863jva0, #tinytable_7lcbhbnn06e1wkeku720 th.tinytable_css_lsaliww65vddk863jva0 { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_7lcbhbnn06e1wkeku720 td.tinytable_css_k39fgspo5jsxoplrce6t, #tinytable_7lcbhbnn06e1wkeku720 th.tinytable_css_k39fgspo5jsxoplrce6t { text-align: left }
#tinytable_7lcbhbnn06e1wkeku720 td.tinytable_css_3eodsi0kib610k7w4q15, #tinytable_7lcbhbnn06e1wkeku720 th.tinytable_css_3eodsi0kib610k7w4q15 { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_7lcbhbnn06e1wkeku720" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept)</td>
<td data-row="1" data-col="2">0.92</td>
<td data-row="1" data-col="3">0.07</td>
<td data-row="1" data-col="4">13.56</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">0.915***</td>
<td data-row="1" data-col="7">2.50</td>
<td data-row="1" data-col="8">149.71</td>
<td data-row="1" data-col="9">2.497***</td>
<td data-row="1" data-col="10">2.18</td>
<td data-row="1" data-col="11">2.84</td>
</tr>
<tr>
<td data-row="2" data-col="1">x</td>
<td data-row="2" data-col="2">1.05</td>
<td data-row="2" data-col="3">0.04</td>
<td data-row="2" data-col="4">27.92</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">1.053***</td>
<td data-row="2" data-col="7">2.87</td>
<td data-row="2" data-col="8">186.74</td>
<td data-row="2" data-col="9">2.867***</td>
<td data-row="2" data-col="10">2.66</td>
<td data-row="2" data-col="11">3.09</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
</div>
<div id="mlogit-models" class="section level2">
<h2>mlogit models</h2>
<pre>library(mlogit)
data(&quot;Fishing&quot;, package = &quot;mlogit&quot;)
Fish &lt;- dfidx(Fishing, varying = 2:9, shape = &quot;wide&quot;, choice = &quot;mode&quot;)
## a pure &quot;conditional&quot; model
mml &lt;- mlogit(mode ~ price + catch, data = Fish)
modelsummary::datasummary_df(odds_summary(mml))</pre>
<!-- preamble start -->
<script src="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.js"></script>
<script>
// Create table-specific functions using external factory
const tableFns_yo9xl3tdb0dmhpbubepa = TinyTable.createTableFunctions("tinytable_yo9xl3tdb0dmhpbubepa");
// tinytable span after
window.addEventListener('load', function () {
var cellsToStyle = [
// tinytable style arrays after
{ positions: [ { i: '5', j: 1 }, { i: '5', j: 2 }, { i: '5', j: 3 }, { i: '5', j: 4 }, { i: '5', j: 5 }, { i: '5', j: 6 }, { i: '5', j: 7 }, { i: '5', j: 8 }, { i: '5', j: 9 }, { i: '5', j: 10 }, { i: '5', j: 11 } ], css_id: 'tinytable_css_wbga30kl45m8t587d9i7',},
{ positions: [ { i: '1', j: 1 }, { i: '2', j: 1 }, { i: '3', j: 1 }, { i: '4', j: 1 }, { i: '1', j: 2 }, { i: '2', j: 2 }, { i: '3', j: 2 }, { i: '4', j: 2 }, { i: '1', j: 3 }, { i: '2', j: 3 }, { i: '3', j: 3 }, { i: '4', j: 3 }, { i: '1', j: 4 }, { i: '2', j: 4 }, { i: '3', j: 4 }, { i: '4', j: 4 }, { i: '1', j: 5 }, { i: '2', j: 5 }, { i: '3', j: 5 }, { i: '4', j: 5 }, { i: '1', j: 6 }, { i: '2', j: 6 }, { i: '3', j: 6 }, { i: '4', j: 6 }, { i: '1', j: 7 }, { i: '2', j: 7 }, { i: '3', j: 7 }, { i: '4', j: 7 }, { i: '1', j: 8 }, { i: '2', j: 8 }, { i: '3', j: 8 }, { i: '4', j: 8 }, { i: '1', j: 9 }, { i: '2', j: 9 }, { i: '3', j: 9 }, { i: '4', j: 9 }, { i: '1', j: 10 }, { i: '2', j: 10 }, { i: '3', j: 10 }, { i: '4', j: 10 }, { i: '1', j: 11 }, { i: '2', j: 11 }, { i: '3', j: 11 }, { i: '4', j: 11 } ], css_id: 'tinytable_css_239hwaw935p9dz755qze',},
{ positions: [ { i: '0', j: 1 }, { i: '0', j: 2 }, { i: '0', j: 3 }, { i: '0', j: 4 }, { i: '0', j: 5 }, { i: '0', j: 6 }, { i: '0', j: 7 }, { i: '0', j: 8 }, { i: '0', j: 9 }, { i: '0', j: 10 }, { i: '0', j: 11 } ], css_id: 'tinytable_css_chmy26l12rxcl0kzjmvz',},
];
// Loop over the arrays to style the cells
cellsToStyle.forEach(function (group) {
group.positions.forEach(function (cell) {
tableFns_yo9xl3tdb0dmhpbubepa.styleCell(cell.i, cell.j, group.css_id);
});
});
});
</script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/vincentarelbundock/tinytable@main/inst/tinytable.css">
<style>
/* tinytable css entries after */
#tinytable_yo9xl3tdb0dmhpbubepa td.tinytable_css_wbga30kl45m8t587d9i7, #tinytable_yo9xl3tdb0dmhpbubepa th.tinytable_css_wbga30kl45m8t587d9i7 { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 0; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.08em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.1em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
#tinytable_yo9xl3tdb0dmhpbubepa td.tinytable_css_239hwaw935p9dz755qze, #tinytable_yo9xl3tdb0dmhpbubepa th.tinytable_css_239hwaw935p9dz755qze { text-align: left }
#tinytable_yo9xl3tdb0dmhpbubepa td.tinytable_css_chmy26l12rxcl0kzjmvz, #tinytable_yo9xl3tdb0dmhpbubepa th.tinytable_css_chmy26l12rxcl0kzjmvz { position: relative; --border-bottom: 1; --border-left: 0; --border-right: 0; --border-top: 1; --line-color-bottom: var(--tt-line-color); --line-color-left: var(--tt-line-color); --line-color-right: var(--tt-line-color); --line-color-top: var(--tt-line-color); --line-width-bottom: 0.05em; --line-width-left: 0.1em; --line-width-right: 0.1em; --line-width-top: 0.08em; --trim-bottom-left: 0%; --trim-bottom-right: 0%; --trim-left-bottom: 0%; --trim-left-top: 0%; --trim-right-bottom: 0%; --trim-right-top: 0%; --trim-top-left: 0%; --trim-top-right: 0%; ; text-align: left }
</style>
<div class="container">
<table class="tinytable" id="tinytable_yo9xl3tdb0dmhpbubepa" style="width: auto; margin-left: auto; margin-right: auto;" data-quarto-disable-processing='true'>
<thead>
<tr>
<th scope="col" data-row="0" data-col="1">Variables</th>
<th scope="col" data-row="0" data-col="2">Coefficient</th>
<th scope="col" data-row="0" data-col="3">Std Error</th>
<th scope="col" data-row="0" data-col="4">t value</th>
<th scope="col" data-row="0" data-col="5">p value</th>
<th scope="col" data-row="0" data-col="6">Coef Sig</th>
<th scope="col" data-row="0" data-col="7">Odds_ratio</th>
<th scope="col" data-row="0" data-col="8">%</th>
<th scope="col" data-row="0" data-col="9">Odds Sig</th>
<th scope="col" data-row="0" data-col="10">CI_lower</th>
<th scope="col" data-row="0" data-col="11">CI_upper</th>
</tr>
</thead>
<tbody>
<tr>
<td data-row="1" data-col="1">(Intercept):boat</td>
<td data-row="1" data-col="2">0.87</td>
<td data-row="1" data-col="3">0.11</td>
<td data-row="1" data-col="4">7.64</td>
<td data-row="1" data-col="5">0.00</td>
<td data-row="1" data-col="6">0.871***</td>
<td data-row="1" data-col="7">2.39</td>
<td data-row="1" data-col="8">139.02</td>
<td data-row="1" data-col="9">2.39***</td>
<td data-row="1" data-col="10">1.91</td>
<td data-row="1" data-col="11">2.99</td>
</tr>
<tr>
<td data-row="2" data-col="1">(Intercept):charter</td>
<td data-row="2" data-col="2">1.50</td>
<td data-row="2" data-col="3">0.13</td>
<td data-row="2" data-col="4">11.28</td>
<td data-row="2" data-col="5">0.00</td>
<td data-row="2" data-col="6">1.499***</td>
<td data-row="2" data-col="7">4.48</td>
<td data-row="2" data-col="8">347.67</td>
<td data-row="2" data-col="9">4.477***</td>
<td data-row="2" data-col="10">3.45</td>
<td data-row="2" data-col="11">5.81</td>
</tr>
<tr>
<td data-row="3" data-col="1">(Intercept):pier</td>
<td data-row="3" data-col="2">0.31</td>
<td data-row="3" data-col="3">0.11</td>
<td data-row="3" data-col="4">2.68</td>
<td data-row="3" data-col="5">0.01</td>
<td data-row="3" data-col="6">0.307**</td>
<td data-row="3" data-col="7">1.36</td>
<td data-row="3" data-col="8">35.94</td>
<td data-row="3" data-col="9">1.359**</td>
<td data-row="3" data-col="10">1.09</td>
<td data-row="3" data-col="11">1.70</td>
</tr>
<tr>
<td data-row="4" data-col="1">price</td>
<td data-row="4" data-col="2">-0.02</td>
<td data-row="4" data-col="3">0.00</td>
<td data-row="4" data-col="4">-14.54</td>
<td data-row="4" data-col="5">0.00</td>
<td data-row="4" data-col="6">-0.025***</td>
<td data-row="4" data-col="7">0.98</td>
<td data-row="4" data-col="8">-2.45</td>
<td data-row="4" data-col="9">0.976***</td>
<td data-row="4" data-col="10">0.97</td>
<td data-row="4" data-col="11">0.98</td>
</tr>
<tr>
<td data-row="5" data-col="1">catch</td>
<td data-row="5" data-col="2">0.38</td>
<td data-row="5" data-col="3">0.11</td>
<td data-row="5" data-col="4">3.43</td>
<td data-row="5" data-col="5">0.00</td>
<td data-row="5" data-col="6">0.377***</td>
<td data-row="5" data-col="7">1.46</td>
<td data-row="5" data-col="8">45.82</td>
<td data-row="5" data-col="9">1.458***</td>
<td data-row="5" data-col="10">1.18</td>
<td data-row="5" data-col="11">1.81</td>
</tr>
</tbody>
</table>
</div>
<!-- hack to avoid NA insertion in last line -->
</div>
<div id="multinomial-logistic-model" class="section level2">
<h2>Multinomial Logistic model</h2>
<p>For multinomial logistic regression, each of the measure is a data.frame, so the summary is a list of data frames. The summary is for each of the levels of the dependent variable.</p>
<pre>library(nnet)
tinom &lt;- multinom(Species ~ Petal.Length + Petal.Width + Sepal.Length
+ Sepal.Width, trace = FALSE, data = iris)
odds_summary(tinom)
$coefficient
Variables versicolor virginica
1 (Intercept) 18.690374 -23.836276
2 Petal.Length 14.244770 23.659779
3 Petal.Width -3.097684 15.135301
4 Sepal.Length -5.458424 -7.923634
5 Sepal.Width -8.707401 -15.370769
$t_value
versicolor virginica
1 0.53445109 -0.66644166
2 0.23665670 0.39128070
3 -0.06809815 0.32950063
4 -0.06072192 -0.08812701
5 -0.05544649 -0.09782845
$Odds_ratio
Variables versicolor virginica
1 (Intercept) 1.309563e+08 4.446690e-11
2 Petal.Length 1.536120e+06 1.885001e+10
3 Petal.Width 4.515366e-02 3.742635e+06
4 Sepal.Length 4.260265e-03 3.620841e-04
5 Sepal.Width 1.653575e-04 2.111348e-07
$Percent_odds
Variables versicolor virginica
1 (Intercept) 1.309563e+10 -1.000000e+02
2 Petal.Length 1.536119e+08 1.885001e+12
3 Petal.Width -9.548463e+01 3.742634e+08
4 Sepal.Length -9.957397e+01 -9.996379e+01
5 Sepal.Width -9.998346e+01 -9.999998e+01
$Coef_sig
versicolor virginica
1 18.69 -23.836
2 14.245 23.66
3 -3.098 15.135
4 -5.458 -7.924
5 -8.707 -15.371
$Odds_sig
versicolor virginica
1 130956302.531 0
2 1536119.713 18850009278.36
3 0.045 3742635.304
4 0.004 0
5 0 0
$p_value
versicolor virginica
1 0.5930295 0.5051288
2 0.8129231 0.6955898
3 0.9457075 0.7417773
4 0.9515807 0.9297757
5 0.9557828 0.9220685
$Confident_interval
Variables Lower versicolor Upper versicolor Lower virginica
1 (Intercept) -49.85184 87.23259 -93.93730
2 Petal.Length -103.72880 132.21834 -94.85441
3 Petal.Width -92.25354 86.05818 -74.89380
4 Sepal.Length -181.64381 170.72696 -184.14699
5 Sepal.Width -316.50312 299.08832 -323.31956
Upper virginica
1 46.26475
2 142.17397
3 105.16440
4 168.29972
5 292.57802</pre>
</div>
</div>
<div id="conclusion" class="section level1">
<h1>Conclusion</h1>
<p>The odds_summary function represents a critical advancement in the practical interpretation of probabilistic estimates within the <em>Dyn4cast</em>
environment. Its significance lies not in computation alone, but in converting statistical output into actionable knowledge. In econometric modelling systems, prediction without interpretation is incomplete. The odds_summary function closes that gap.</p>
</div>
<div id="attrition" class="section level1">
<h1>Attrition</h1>
<pre>cite this article as:
Nmadu J (2025). odds_summary: Turning Probabilistic Estimates into Clear, Decision-Ready Insights. https://www.jobnmadu.com/r-blog/.
To cite package &#39;Dyn4cast&#39; in publications use:
Nmadu J (2025). _Dyn4cast: Dynamic Modeling and Machine Learning
Environment_. R package version 11.11.26,
&lt;https://github.com/JobNmadu/Dyn4cast&gt;.
A BibTeX entry for LaTeX users is
@Manual{,
title = {_Dyn4cast: Dynamic Modeling and Machine Learning Environment_},
author = {Job Nmadu},
year = {2025},
note = {R package version 11.11.26},
url = {https://github.com/JobNmadu/Dyn4cast},
}</pre>
<p>Welcome to Data Science and Machine Learning!</p>
</div>
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://www.jobnmadu.com/r-blog/2026-03-23-r-markdown/oddsratio/"> R-Blog on Data modelling to develop ...</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/odds_summary-turning-probabilistic-estimates-into-clear-decision-ready-insights/">odds_summary: Turning Probabilistic Estimates into Clear, Decision-Ready Insights</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400070</post-id>	</item>
		<item>
		<title>Why Advanced Attribution Matters More When Everyone Else Uses Last Touch</title>
		<link>https://www.r-bloggers.com/2026/03/why-advanced-attribution-matters-more-when-everyone-else-uses-last-touch/</link>
		
		<dc:creator><![CDATA[Florian Teschner]]></dc:creator>
		<pubDate>Sun, 22 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">http://flovv.github.io/advanced-attribution-when-everyone-uses-last-touch/</guid>

					<description><![CDATA[<p>Short practical advice on attribution:</p>
<p>    Treat last-touch as a bias, not as a neutral baseline - If most advertisers optimize to last-touch conversions, budgets and auction pressure will drift toward channels that are easy to credit rather tha...</p>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/why-advanced-attribution-matters-more-when-everyone-else-uses-last-touch/">Why Advanced Attribution Matters More When Everyone Else Uses Last Touch</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="http://flovv.github.io/advanced-attribution-when-everyone-uses-last-touch/"> Florian Teschner</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p><img src="https://i2.wp.com/flovv.github.io/images/advanced-attribution-cover_0.jpg?w=578" alt="Beyond last touch cover" data-recalc-dims="1" /></p>

<h2 id="short-practical-advice-on-attribution">Short practical advice on attribution:</h2>

<ol>
  <li>
    <p><strong>Treat last-touch as a bias, not as a neutral baseline</strong> &#8211; If most advertisers optimize to last-touch conversions, budgets and auction pressure will drift toward channels that are easy to credit rather than channels that are truly incremental.</p>
  </li>
  <li>
    <p><strong>Advanced attribution is valuable because it changes spending behavior</strong> &#8211; The upside is not cleaner reporting. The upside is spending less on overpriced “credit-capturing” inventory and more on channels that create demand earlier in the journey.</p>
  </li>
  <li>
    <p><strong>Non-click measurement matters most in click-biased markets</strong> &#8211; If the market still rewards final clicks, then impression effects, sequence effects, and assisted conversions are systematically underpriced.</p>
  </li>
  <li>
    <p><strong>Do not expect more sophistication to automatically raise profit</strong> &#8211; Better measurement only helps if it changes bidding, budget allocation, and evaluation rules inside the company.</p>
  </li>
  <li>
    <p><strong>Add incrementality before adding complexity</strong> &#8211; A simple holdout, geo test, or lift study is often more useful than another dashboard built on the same click-based attribution logic.</p>
  </li>
</ol>

<h2 id="long-version">Long Version</h2>

<p>I just read Ron Berman’s paper <em>Beyond the Last Touch: Attribution in Online Advertising</em>, and I think the most useful implication for practitioners is slightly different from the headline result.</p>

<p>The paper compares no attribution, last-touch attribution, and Shapley-value attribution in a multi-publisher advertising market. Its central result is that <strong>attribution is not just a reporting layer. It changes bidding incentives and therefore changes market outcomes.</strong></p>

<p>That matters because most advertisers still do not operate with advanced measurement. In many cases they rely on some mix of:</p>

<ul>
  <li>last-touch or last-click attribution</li>
  <li>platform-reported conversions</li>
  <li>weak or infrequent incrementality testing</li>
  <li>budget allocation rules based on short-term ROAS</li>
</ul>

<p>In that world, advanced attribution has a second source of value: it helps you avoid the distortions created by everyone else’s bad measurement.</p>

<h3 id="what-the-paper-actually-shows">What the paper actually shows</h3>

<p>Berman models a market where one advertiser buys media across multiple publishers while other advertisers are more local. Consumers may see ads across different publishers, and one exposure can affect the value of another. That creates externalities across touchpoints.</p>

<p>The important result is that these externalities make optimization hard even when everyone is symmetric in information. This is not just a fraud problem or a platform-information problem. It is a structural feature of multi-publisher advertising.</p>

<p>The paper then shows two things that are highly relevant in practice:</p>

<ol>
  <li><strong>Last-touch attribution often pushes advertisers to overbid.</strong></li>
  <li><strong>A more balanced attribution rule such as the Shapley value usually performs better than last touch for the advertiser.</strong></li>
</ol>

<p>There is also a useful warning in the paper: more accurate attribution does not always increase advertiser profit. Once you account for market equilibrium and competitor response, the relationship between better measurement and better outcomes is not trivial.</p>

<p>That is the academic result. The practical extension is where things get interesting.</p>

<h3 id="why-advanced-attribution-becomes-more-valuable-when-most-advertisers-still-use-last-touch">Why advanced attribution becomes more valuable when most advertisers still use last touch</h3>

<p>If most of the market uses last-touch logic, then the market is not being priced on incremental contribution. It is being priced on who gets the final credit.</p>

<p>That sounds like a reporting issue, but it is really a budget allocation issue.</p>

<p>Channels that close demand tend to look stronger than they are. Branded search, retargeting, affiliate traffic, and other lower-funnel placements often appear to be doing all the work because they sit close to the conversion event. Channels that create demand earlier in the journey often look weaker because they rarely receive the final touch.</p>

<p>If enough advertisers optimize this way, a predictable pattern emerges:</p>

<ul>
  <li>lower-funnel inventory becomes crowded and expensive</li>
  <li>channels that harvest existing intent absorb too much budget</li>
  <li>upper- and mid-funnel channels look worse than their true contribution</li>
  <li>advertisers confuse ease of measurement with causal impact</li>
</ul>

<p>This is exactly where advanced attribution helps.</p>

<p>It helps not because it gives you a prettier customer journey chart, but because it gives you a better pricing model for media. If your attribution system captures assisted effects, non-click influence, or incremental lift more accurately than the market standard, you can avoid overpaying for touchpoints that are merely good at showing up last.</p>

<h3 id="why-non-click-measurement-matters">Why non-click measurement matters</h3>

<p>The paper itself is about attribution rules, not a direct comparison of click-based versus non-click-based systems. But the implication is clear.</p>

<p>If the dominant market standard is click-biased last-touch measurement, then any method that can recover value from non-click exposures has a structural advantage.</p>

<p>That could include:</p>

<ul>
  <li>impression-based attribution</li>
  <li>conversion lift studies</li>
  <li>geo experiments</li>
  <li>holdout tests</li>
  <li>media mix modeling</li>
  <li>sequence analysis using first-party exposure data</li>
</ul>

<p>These methods are imperfect, and they answer slightly different questions. But they all do something last-touch usually cannot do well: they credit touchpoints that influence conversion without demanding a final click.</p>

<p>In a market full of click-based optimization, that means advanced measurement can uncover media that is genuinely incremental but systematically undervalued.</p>

<h3 id="the-real-gain-is-not-better-reporting-it-is-better-spending-behavior">The real gain is not better reporting. It is better spending behavior.</h3>

<p>I think this is the most important practical takeaway from the paper.</p>

<p>Advanced attribution should be judged by whether it improves decisions such as:</p>

<ul>
  <li>Which channels deserve more budget?</li>
  <li>Which channels are just harvesting demand that already exists?</li>
  <li>Where are we overbidding because the market over-credits the final touch?</li>
  <li>Which publishers look weak only because our measurement ignores assisted effects?</li>
</ul>

<p>If your attribution system changes those decisions, it has value.</p>

<p>If it only creates a more sophisticated dashboard while the company still optimizes to platform ROAS and last-click conversions, it has much less value.</p>

<p>That is also why many measurement projects disappoint. The model gets better, but the organization does not change the control system. Finance still trusts the old KPI. Paid media teams still chase the same targets. Creative teams still optimize for clicks because clicks are what get reported weekly.</p>

<p>The paper’s logic fits this well: attribution changes incentives. If the incentives do not change, the gains from attribution are limited.</p>

<h3 id="a-simple-mental-model-for-advertisers">A simple mental model for advertisers</h3>

<p>If most competitors use last touch, assume three things:</p>

<ol>
  <li><strong>Some inventory is overpriced because it captures credit, not because it creates lift.</strong></li>
  <li><strong>Some inventory is underpriced because it influences conversion without closing it.</strong></li>
  <li><strong>Your edge comes from estimating that gap better than the market.</strong></li>
</ol>

<p>That is why advanced attribution and incrementality measurement can be valuable even if they are noisy. You do not need a perfect model. You need a model that is less wrong than the one most of the market is using.</p>

<h3 id="where-the-paper-is-most-useful-and-where-it-is-limited">Where the paper is most useful, and where it is limited</h3>

<p>One thing I like about the paper is that it is transparent about trade-offs. It does not claim that better attribution always improves everything. In fact, some attribution rules improve advertiser profit partly by shifting profit away from publishers, not by massively improving total market efficiency.</p>

<p>That is a useful correction to the usual industry narrative that “better measurement” always makes the whole system work better.</p>

<p>At the same time, the paper is stylized. It uses a game-theoretic model with a limited number of publishers and advertisers. It is not a direct operational guide to how to run MMM, lift testing, or conversion APIs in a modern stack.</p>

<p>So I would use it for the strategic lesson, not for a literal implementation recipe.</p>

<p>The strategic lesson is:</p>

<p><strong>When the market overuses last-touch attribution, advanced measurement becomes more valuable because it helps advertisers resist the bidding and budgeting distortions created by everyone else’s simplistic measurement.</strong></p>

<h3 id="what-i-would-do-in-practice">What I would do in practice</h3>

<p>If I were working with an advertiser that still leans heavily on last-touch reporting, I would not start by building a highly complex attribution model.</p>

<p>I would start with a simpler sequence:</p>

<ol>
  <li>Keep last-touch reporting as an operational lens, not the source of truth.</li>
  <li>Add one credible incrementality layer such as a holdout test, geo test, or lift study.</li>
  <li>Separate channels that <strong>create demand</strong> from channels that mostly <strong>capture demand</strong>.</li>
  <li>Use attribution outputs to inform budget ranges, not to automate every bid immediately.</li>
  <li>Revisit evaluation metrics so teams are not punished for investing in channels that assist but rarely close.</li>
</ol>

<p>That is probably the highest-return path for most advertisers because the biggest issue is usually not lack of modeling sophistication. It is overdependence on a biased attribution rule.</p>

<h3 id="bottom-line">Bottom line</h3>

<p>If most advertisers still use last touch and weak testing, advanced attribution is more valuable than it first appears.</p>

<p>Not because it makes reports look better, but because it can help you spend against <strong>incremental contribution</strong> while the rest of the market spends against <strong>credited contribution</strong>.</p>

<p>That gap is where the advantage is.</p>

<hr />

<p>Paper: Ron Berman, <em>Beyond the Last Touch: Attribution in Online Advertising</em>.</p>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="http://flovv.github.io/advanced-attribution-when-everyone-uses-last-touch/"> Florian Teschner</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/why-advanced-attribution-matters-more-when-everyone-else-uses-last-touch/">Why Advanced Attribution Matters More When Everyone Else Uses Last Touch</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400041</post-id>	</item>
		<item>
		<title>Speeding up tidySummarizedExperiment through query optimisation and the plyxp backend</title>
		<link>https://www.r-bloggers.com/2026/03/speeding-up-tidysummarizedexperiment-through-query-optimisation-and-the-plyxp-backend/</link>
		
		<dc:creator><![CDATA[Stefano Mangiola]]></dc:creator>
		<pubDate>Sun, 22 Mar 2026 00:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://tidyomics.github.io/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization/</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; ">
<p>tidySummarizedExperiment logo</p>
<p>Contributors: Michael Love, Justin Landis, Pierre-Paul Axisa<br />
The generality of tidySummarizedExperiment makes it easy to interface with several tidyverse packages (e.g. dplyr, tidyr, ggplot2, purrr, plotly...</p></div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/speeding-up-tidysummarizedexperiment-through-query-optimisation-and-the-plyxp-backend/">Speeding up tidySummarizedExperiment through query optimisation and the plyxp backend</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://tidyomics.github.io/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization/"> tidyomicsBlog</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
 





<div class="quarto-figure quarto-figure-left">
<figure class="figure">
<p><img src="https://i2.wp.com/tidyomics.github.io/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization/logo.png?w=150&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
<figcaption>tidySummarizedExperiment logo</figcaption>
</figure>
</div>
<p><em>Contributors: Michael Love, Justin Landis, Pierre-Paul Axisa</em></p>
<p>The generality of <a href="https://bioconductor.org/packages/tidySummarizedExperiment" rel="nofollow" target="_blank"><code>tidySummarizedExperiment</code></a> makes it easy to interface with several <a href="https://www.tidyverse.org/" rel="nofollow" target="_blank"><code>tidyverse</code></a> packages (e.g. <a href="https://cran.r-project.org/package=dplyr" rel="nofollow" target="_blank"><code>dplyr</code></a>, <a href="https://cran.r-project.org/package=tidyr" rel="nofollow" target="_blank"><code>tidyr</code></a>, <a href="https://cran.r-project.org/package=ggplot2" rel="nofollow" target="_blank"><code>ggplot2</code></a>, <a href="https://cran.r-project.org/package=purrr" rel="nofollow" target="_blank"><code>purrr</code></a>, <a href="https://cran.r-project.org/package=plotly" rel="nofollow" target="_blank"><code>plotly</code></a>). This is possible thanks to its approach of converting <a href="https://bioconductor.org/packages/SummarizedExperiment" rel="nofollow" target="_blank"><code>SummarizedExperiment</code></a> objects to tibbles, performing operations, and converting back to the original format. This conversion process introduces substantial overhead when working with large-scale datasets. Each operation requires multiple data transformations, with the conversion to tibble format creating memory copies of the entire dataset, followed by the reverse conversion back to <a href="https://bioconductor.org/packages/SummarizedExperiment" rel="nofollow" target="_blank"><code>SummarizedExperiment</code></a>. For datasets containing hundreds of samples and tens of thousands of genes, these repeated conversions can consume memory and add significant computational overhead to even simple operations such as filtering or grouping.</p>
<p>With the new <a href="https://bioconductor.org/packages/tidySummarizedExperiment" rel="nofollow" target="_blank"><code>tidySummarizedExperiment</code></a> release (<a href="https://github.com/tidyomics/tidySummarizedExperiment/releases/tag/v1.19.7" rel="nofollow" target="_blank">v1.19.7</a>), we have introduced new optimisations that address these performance limitations. This optimisation is powered by:</p>
<ol type="1">
<li>Check for the query domain (assay, colData, rowData), and execute specialised operation.</li>
<li>Use of <a href="https://bioconductor.org/packages/plyxp" rel="nofollow" target="_blank"><code>plyxp</code></a> for complex domain-specific queries.</li>
</ol>
<p><em>plyxp</em> is a tidyomics package developed by <a href="https://github.com/jtlandis" rel="nofollow" target="_blank">Justin Landis</a>, and first released as part of Bioconductor 3.20 in October 2024. It uses data-masking functionality from the <a href="https://rlang.r-lib.org/" rel="nofollow" target="_blank">rlang</a> package to perform efficient operations on <em>SummarizedExperiment</em> objects.</p>
<section id="motivation-and-design-principles" class="level3">
<h3 class="anchored" data-anchor-id="motivation-and-design-principles">Motivation and design principles</h3>
<p>This benchmark supports ongoing work to improve the performance of <a href="https://bioconductor.org/packages/tidySummarizedExperiment" rel="nofollow" target="_blank"><code>tidySummarizedExperiment</code></a>. In this benchmark, we show up to 30x improvement in operations such as <code>mutate()</code>.</p>
<p>The current optimisation is grounded in three principles:</p>
<ul>
<li>Decompose operation series: break <code>mutate(a=..., b=..., c=...)</code> into single operations for simpler handling and clearer routing. Reference implementation in <code>R/mutate.R</code> (decomposition step) at <a href="https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L146" rel="nofollow" target="_blank">L146</a>.</li>
<li>Analyse scope: infer whether each expression targets <code>colData</code>, <code>rowData</code>, <code>assays</code>, or a mix (noting that the current analyser is likely over-engineered and could be simplified). See <a href="https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L149" rel="nofollow" target="_blank">L149</a>.</li>
<li>Route mixed operations via plyxp: when an expression touches multiple slots, prefer the plyxp path for correctness and performance. See <a href="https://github.com/tidyomics/tidySummarizedExperiment/blob/92072d71f9d3b9a82cfc5fdced8e52477c44d80f/R/mutate.R#L155" rel="nofollow" target="_blank">L155</a>.</li>
</ul>
<p>These design choices aim to preserve dimnames, avoid unnecessary tibble round-trips, and provide predictable performance across simple and mixed-slot scenarios.</p>
</section>
<section id="example-of-code-optimisation" class="level3">
<h3 class="anchored" data-anchor-id="example-of-code-optimisation">Example of code optimisation</h3>
<p>This was the <code>mutate()</code> method before optimisation. The previous implementation relied on <code>as_tibble() |&gt; dplyr::mutate() |&gt; update_SE_from_tibble(.data)</code></p>
<p>The function <code>update_SE_from_tibble</code> interprets the input tibble and converts it back to a <code>SummarizedExperiment</code>. Although this step provides great generality and flexibility, it is particularly expensive because it must infer whether columns are sample-wise or feature-wise.</p>
<div class="cell">
<details class="code-fold">
<summary>Show pre-optimization source</summary>
<pre>mutate.SummarizedExperiment &lt;- function(.data, ...) {
    # Legacy implementation of mutate() for SummarizedExperiment:
    # - Validates requested edits against special/view-only columns
    # - Performs mutate() via tibble round-trip, then reconstructs the SE
    # Check that we are not modifying a key column
    cols &lt;- enquos(...) |&gt; names()
    
    # Deprecation of special column names:
    # capture all quoted args to detect deprecated special-column usage
    .cols &lt;- enquos(..., .ignore_empty=&quot;all&quot;) %&gt;% 
        map(~ quo_name(.x)) %&gt;% unlist()
    if (is_sample_feature_deprecated_used(.data, .cols)) {
        # Record deprecated usage into metadata for backward compatibility
        .data &lt;- ping_old_special_column_into_metadata(.data)
    }
    
    # Identify view-only/special columns (sample/feature keys, etc.)
    # Use a small slice to reduce overhead while probing structure
    special_columns &lt;- get_special_columns(
        # Decrease the size of the dataset
        .data[1:min(100, nrow(.data)), 1:min(20, ncol(.data))]
    ) |&gt; c(get_needed_columns(.data))
    
    # Are any requested targets among special/view-only columns?
    tst &lt;-
        intersect(
            cols,
            special_columns
        ) |&gt; 
        length() |&gt;
        gt(0)

    if (tst) {
        columns &lt;-
            special_columns |&gt;
                paste(collapse=&quot;, &quot;)
        stop(
            &quot;tidySummarizedExperiment says:&quot;,
            &quot; you are trying to rename a column that is view only&quot;,
            columns,
            &quot;(it is not present in the colData).&quot;,
            &quot; If you want to mutate a view-only column,&quot;,
            &quot; make a copy and mutate that one.&quot;
        )
    }

    # If Ranges column not in query, prefer faster tibble conversion
    # Skip expanding GRanges columns when not referenced
    skip_GRanges &lt;-
        get_GRanges_colnames() %in% 
        cols |&gt;
        not()
    
    # Round-trip: SE -&gt; tibble -&gt; dplyr::mutate -&gt; SE
    .data |&gt;
        as_tibble(skip_GRanges=skip_GRanges) |&gt;
        dplyr::mutate(...) |&gt;
        update_SE_from_tibble(.data)
}</pre>
</details>
</div>
<p>The new implementation captures all easy cases, such as sample-only and feature-only metadata <code>mutate()</code>. If <code>mutate()</code> is a mixed operation that can be factored out to sample- and feature-wise operation it is handled by <code>plyxp</code>. Otherwise, the general solution is used.</p>
<p>Key components to compare: &#8211; The pre-optimization code always uses a tibble round-trip (<code>as_tibble() |&gt; dplyr::mutate() |&gt; update_SE_from_tibble()</code>). &#8211; The optimized code first analyzes scope (<code>colData</code>, <code>rowData</code>, <code>assay</code>, or mixed) and dispatches to specialized paths. &#8211; The fallback still exists (<code>mutate_via_tibble</code>) for complex cases, preserving generality.</p>
<div class="cell">
<details class="code-fold">
<summary>Show post-optimization source</summary>
<pre>mutate.SummarizedExperiment &lt;- function(.data, ...) {

       # Check if query is composed (multiple expressions)
    if (is_composed(&quot;mutate&quot;, ...)) return(decompose_tidy_operation(&quot;mutate&quot;, ...)(.data))

        # Check for scope and dispatch elegantly
        scope_report &lt;- analyze_query_scope_mutate(.data, ...)
        scope &lt;- scope_report$scope

        result &lt;-
        if(scope == &quot;coldata_only&quot;) modify_samples(.data, &quot;mutate&quot;, ...)
        else if(scope == &quot;rowdata_only&quot;) modify_features(.data, &quot;mutate&quot;, ...)
        else if(scope == &quot;assay_only&quot;) mutate_assay(.data, ...)
        else if(scope == &quot;mixed&quot;) modify_se_plyxp(.data, &quot;mutate&quot;, scope_report, ...)
        else mutate_via_tibble(.data, ...)

        # Record latest mutate scope into metadata for testing/introspection
        meta &lt;- S4Vectors::metadata(result)
        if (is.null(meta)) meta &lt;- list()
        meta$latest_mutate_scope_report &lt;- scope_report
        S4Vectors::metadata(result) &lt;- meta

        return(result)

}</pre>
</details>
</div>
</section>
<section id="benchmarking-overview" class="level1">
<h1>Benchmarking Overview</h1>
<p>This vignette benchmarks a set of <a href="https://tidyomics.github.io/tidySummarizedExperiment/reference/mutate.html" rel="nofollow" target="_blank"><code>mutate()</code></a>, <a href="https://tidyomics.github.io/tidySummarizedExperiment/reference/filter.html" rel="nofollow" target="_blank"><code>filter()</code></a>, <a href="https://tidyomics.github.io/tidySummarizedExperiment/reference/select.html" rel="nofollow" target="_blank"><code>select()</code></a>, and <a href="https://tidyomics.github.io/tidySummarizedExperiment/reference/distinct.html" rel="nofollow" target="_blank"><code>distinct()</code></a> scenarios (see <a href="https://bioconductor.org/packages/tidySummarizedExperiment" rel="nofollow" target="_blank">documentation</a>) comparing performance before and after optimisation, by explicitly checking out specific commits via git worktree, loading each commit’s code with <code>devtools::load_all()</code>, running the same scenarios multiple times, and comparing the runtimes with ggplot boxplots.</p>
<ul>
<li>Before optimisation: <a href="https://github.com/tidyomics/tidySummarizedExperiment/commit/87445757d2d0332e7d335d22cd28f73568b7db66" rel="nofollow" target="_blank">commit 87445757d2d0332e7d335d22cd28f73568b7db66</a></li>
<li>After optimisation: <a href="https://github.com/tidyomics/tidySummarizedExperiment/commit/9f7c26e0519c92f9682b270d566da127367bcbc0" rel="nofollow" target="_blank">commit 9f7c26e0519c92f9682b270d566da127367bcbc0</a></li>
</ul>
<section id="setup-helper-functions" class="level3">
<h3 class="anchored" data-anchor-id="setup-helper-functions">Setup helper functions</h3>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>suppressPackageStartupMessages({
  library(ggplot2)
  library(dplyr)
  library(SummarizedExperiment)
  library(rlang)
  library(devtools)
  library(airway)
  library(microbenchmark)
  library(reactable)
  library(patchwork)
})

load_branch_code &lt;- function(worktree_dir) {
  if (!requireNamespace(&quot;devtools&quot;, quietly = TRUE)) stop(&quot;Please install devtools to run this vignette.&quot;)
  # Debug: print the directory we're looking for
  cat(&quot;Looking for worktree directory:&quot;, worktree_dir, &quot;\n&quot;)
  cat(&quot;Directory exists:&quot;, dir.exists(worktree_dir), &quot;\n&quot;)
  cat(&quot;Current working directory:&quot;, getwd(), &quot;\n&quot;)
  # Check if directory exists
  if (!dir.exists(worktree_dir)) {
    stop(paste(&quot;Worktree directory does not exist:&quot;, worktree_dir))
  }
  suppressMessages(devtools::load_all(worktree_dir, quiet = TRUE))
}

create_airway_test_se &lt;- function() {
  suppressPackageStartupMessages(library(airway))
  data(airway)
  se &lt;- airway
  se[1:200, ]
}

benchmark_scenarios &lt;- function() {
  list(
    coldata_simple_assignment = quo({ se |&gt; mutate(new_dex = dex) }),
    coldata_arithmetic = quo({ se |&gt; mutate(avgLength_plus_5 = avgLength + 5) }),
    coldata_concat = quo({ se |&gt; mutate(sample_info = paste(cell, dex, SampleName, sep = &quot;_&quot;)) }),
    coldata_grouped_mean = quo({ se |&gt; group_by(dex) |&gt; mutate(avgLength_group_mean = mean(avgLength)) |&gt; ungroup() }),
    assay_simple_assignment = quo({ se |&gt; mutate(counts_copy = counts) }),
    assay_plus_one = quo({ se |&gt; mutate(counts_plus_1 = counts + 1) }),
    assay_log = quo({ se |&gt; mutate(log_counts_manual = log2(counts + 1)) }),
    complex_conditional_coldata = quo({ se |&gt; mutate(length_group = ifelse(avgLength &gt; mean(avgLength), &quot;longer&quot;, &quot;shorter&quot;)) }),
    complex_nested = quo({ se |&gt; mutate(complex_category = ifelse(dex == &quot;trt&quot; & avgLength &gt; mean(avgLength), &quot;treated_long&quot;, ifelse(dex == &quot;untrt&quot;, &quot;untreated&quot;, &quot;other&quot;))) }),
    mixed_assay_coldata = quo({ se |&gt; mutate(new_counts = counts * avgLength) }),
    multiple_simple_assay = quo({ se |&gt; mutate(normalized_counts = counts / 1000, sqrt_counts = sqrt(counts)) }),
    chained_mutates = quo({ se |&gt; mutate(tmp = avgLength * 2) |&gt; mutate(flag = ifelse(tmp &gt; mean(tmp), 1, 0)) }),

    # Filter benchmarks (scoped and non-rectangular)
    filter_coldata_simple = quo({ se |&gt; filter(dex == &quot;trt&quot;) }),
    filter_coldata_numeric = quo({ se |&gt; filter(avgLength &gt; median(avgLength)) }),
    filter_assay_nonrect = quo({ se |&gt; filter(counts &gt; 0) }),

    # Select benchmarks (covering colData-only, rowData-only, assays-only, mixed)
    select_coldata_simple = quo({ se |&gt; select(.sample, dex) }),
    select_rowdata_simple = quo({ se |&gt; select(.feature) }),
    select_assay_only = quo({ se |&gt; select(counts) }),
    select_mixed_keys_counts = quo({ se |&gt; select(.sample, .feature, counts) }),
    select_coldata_wide = quo({ se |&gt; select(.sample, dex, avgLength, SampleName) }),

    # Distinct benchmarks (covering colData-only, rowData-only, assays-only, mixed)
    distinct_coldata_simple = quo({ se |&gt; distinct(dex) }),
    distinct_coldata_multiple = quo({ se |&gt; distinct(dex, avgLength) }),
    distinct_rowdata_simple = quo({ se |&gt; distinct(.feature) }),
    distinct_assay_only = quo({ se |&gt; distinct(counts) }),
    distinct_mixed_keys_counts = quo({ se |&gt; distinct(.sample, .feature, counts) }),
    distinct_coldata_wide = quo({ se |&gt; distinct(.sample, dex, avgLength, SampleName) }),
    distinct_with_keep_all = quo({ se |&gt; distinct(dex, .keep_all = TRUE) }),
    distinct_complex_expression = quo({ se |&gt; distinct(dex, avgLength) })
  )
}

run_one &lt;- function(expr_quo, reps = 5L) {
  se_base &lt;- create_airway_test_se()
  mb &lt;- microbenchmark::microbenchmark(
    eval_tidy(expr_quo),
    times = reps,
    setup = { se &lt;- se_base },          # reuse the same input, avoid recreating inside the timed expr
    control = list(warmup = 2L)
  )
  # microbenchmark returns nanoseconds; convert to milliseconds
  as.numeric(mb$time) / 1e6
}

run_all_scenarios &lt;- function(branch_label, reps = 7L) {
  scenarios &lt;- benchmark_scenarios()
  out &lt;- list()
  for (nm in names(scenarios)) {
    tms &lt;- run_one(scenarios[[nm]], reps = reps)
    out[[length(out) + 1L]] &lt;- data.frame(
      branch = branch_label,
      scenario = nm,
      replicate = seq_along(tms),
      elapsed_ms = tms,
      stringsAsFactors = FALSE
    )
  }
  bind_rows(out)
}

# Parallel version: run each scenario on a separate worker
run_all_scenarios_parallel &lt;- function(branch_label, reps = 20L, workers = 1L, initializer = NULL) {
  scenarios &lt;- benchmark_scenarios()
  nms &lt;- names(scenarios)
  old_plan &lt;- future::plan()
  on.exit(future::plan(old_plan), add = TRUE)
  future::plan(future::multisession, workers = workers)
  res &lt;- future.apply::future_lapply(nms, function(nm) {
    if (!is.null(initializer)) initializer()
    tms &lt;- run_one(scenarios[[nm]], reps = reps)
    data.frame(
      branch = branch_label,
      scenario = nm,
      replicate = seq_along(tms),
      elapsed_ms = tms,
      stringsAsFactors = FALSE
    )
  }, future.seed = TRUE)
  dplyr::bind_rows(res)
}</pre>
</details>
</div>
</section>
<section id="run-benchmarks-on-both-branches" class="level3">
<h3 class="anchored" data-anchor-id="run-benchmarks-on-both-branches">Run benchmarks on both branches</h3>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre># Worktree directories (already exist in the post directory)
wt_before &lt;- &quot;.__bench_before__&quot;
wt_after &lt;- &quot;.__bench_after__&quot;

# Verify worktrees exist
if (!dir.exists(wt_before)) {
  stop(&quot;Worktree directory does not exist: &quot;, wt_before)
}
if (!dir.exists(wt_after)) {
  stop(&quot;Worktree directory does not exist: &quot;, wt_after)
}

# Before optimisation (commit 87445757)
load_branch_code(wt_before)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>Looking for worktree directory: .__bench_before__ 
Directory exists: TRUE 
Current working directory: /Users/a1234450/Documents/GitHub/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization </pre>
</div>
<details class="code-fold">
<summary>Show the code</summary>
<pre>res_before &lt;- run_all_scenarios(branch_label = &quot;before_optimization&quot;, reps = 10L)

# After optimisation (commit 9f7c26e)
load_branch_code(wt_after)</pre>
</details>
<div class="cell-output cell-output-stdout">
<pre>Looking for worktree directory: .__bench_after__ 
Directory exists: TRUE 
Current working directory: /Users/a1234450/Documents/GitHub/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization </pre>
</div>
<details class="code-fold">
<summary>Show the code</summary>
<pre>res_after &lt;- run_all_scenarios(branch_label = &quot;after_optimization&quot;, reps = 10L)

results &lt;- dplyr::bind_rows(res_before, res_after) |&gt;
  dplyr::mutate(operation = dplyr::case_when(
    grepl(&quot;^filter&quot;, scenario) ~ &quot;filter&quot;,
    grepl(&quot;^select&quot;, scenario) ~ &quot;select&quot;,
    grepl(&quot;^distinct&quot;, scenario) ~ &quot;distinct&quot;,
    TRUE ~ &quot;mutate&quot;
  ))

summary_table &lt;- results |&gt;
  group_by(branch, scenario) |&gt;
  summarise(median_ms = median(elapsed_ms), .groups = &quot;drop&quot;) |&gt;
  tidyr::pivot_wider(names_from = branch, values_from = median_ms) |&gt; 
  dplyr::mutate(speedup = round(before_optimization / after_optimization, 2))</pre>
</details>
</div>
<div class="cell">
<div class="cell-output-display">
<div class="reactable html-widget html-fill-item" id="htmlwidget-226268d0e09b28952dc5" style="width:auto;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-226268d0e09b28952dc5">{"x":{"tag":{"name":"Reactable","attribs":{"data":{"scenario":["assay_log","assay_plus_one","assay_simple_assignment","chained_mutates","coldata_arithmetic","coldata_concat","coldata_grouped_mean","coldata_simple_assignment","complex_conditional_coldata","complex_nested","distinct_assay_only","distinct_coldata_multiple","distinct_coldata_simple","distinct_coldata_wide","distinct_complex_expression","distinct_mixed_keys_counts","distinct_rowdata_simple","distinct_with_keep_all","filter_assay_nonrect","filter_coldata_numeric","filter_coldata_simple","mixed_assay_coldata","multiple_simple_assay","select_assay_only","select_coldata_simple","select_coldata_wide","select_mixed_keys_counts","select_rowdata_simple"],"after_optimization":[12.4554805,12.328708,11.681375,21.5625,11.0071665,11.0921455,93.9895835,11.321708,11.3674785,11.787646,183.6270415,104.3887705,103.090917,172.007979,108.229458,211.791104,102.567687,116.4211665,74.754,21.781209,22.755375,29.610208,25.965729,174.4993535,98.666563,102.3431045,115.9610005,97.5994795],"before_optimization":[289.085042,289.313729,288.7251875,548.727708,288.0155005,281.744729,88.8149785,291.0410415,271.0590425,274.0804585,197.556417,118.027792,116.40925,119.396937,117.7116455,199.8871875,114.4365215,118.1221665,104.8900625,25.817354,26.3123745,283.314792,299.2484995,181.386271,123.428688,113.5046455,117.837937,113.6867925],"speedup":[23.21,23.47,24.72,25.45,26.17,25.4,0.94,25.71,23.85,23.25,1.08,1.13,1.13,0.69,1.09,0.94,1.12,1.01,1.4,1.19,1.16,9.57,11.52,1.04,1.25,1.11,1.02,1.16]},"columns":[{"id":"scenario","name":"Scenario","type":"character","minWidth":220,"align":"left"},{"id":"after_optimization","name":"After (ms)","type":"numeric","minWidth":120,"align":"left","format":{"cell":{"digits":1},"aggregated":{"digits":1}}},{"id":"before_optimization","name":"Before (ms)","type":"numeric","minWidth":120,"align":"left","format":{"cell":{"digits":1},"aggregated":{"digits":1}}},{"id":"speedup","name":"Speedup (x)","type":"numeric","minWidth":120,"align":"left","format":{"cell":{"digits":2},"aggregated":{"digits":2}}}],"filterable":true,"searchable":true,"defaultPageSize":10,"highlight":true,"bordered":true,"striped":true,"compact":true,"dataKey":"5e96953b8fc47c84831c51a6f5bf258f"},"children":[]},"class":"reactR_markup"},"evals":[],"jsHooks":[]}</script>
</div>
</div>
</section>
</section>
<section id="visualize-with-combined-performance-plots" class="level1">
<h1>Visualize with combined performance plots</h1>
<div class="cell">
<details class="code-fold">
<summary>Show the code</summary>
<pre>dodge_w &lt;- 0.7

p_box &lt;- ggplot(results, aes(x = scenario, y = elapsed_ms, fill = branch)) +
  geom_boxplot(position = position_dodge(width = dodge_w), width = 0.7, outlier.shape = NA) +

  # Add jittered points aligned with the dodged boxplots
  geom_point(
    position = position_jitterdodge(jitter.width = 0.1, jitter.height = 0, dodge.width = dodge_w), 
    alpha = 0.6, 
    size = 0.5
  ) +
  scale_y_log10() + 
  coord_flip() +
  facet_grid(operation ~ ., scales = &quot;free_y&quot;, space = &quot;free_y&quot;) +
  annotation_logticks(sides = &quot;b&quot;) +
  labs(title = &quot;Performance comparison: Before vs After optimization&quot;,
       x = &quot;Scenario&quot;,
       y = &quot;Elapsed (ms)&quot;) +
  theme_bw() +
  
  # Angle x labels  
  theme(legend.position = &quot;top&quot;, axis.text.x = element_text(angle = 45, hjust = 1))

# Speedup summary panel (median before/after ratio)
speedup_plot_data &lt;- summary_table |&gt;
  dplyr::mutate(operation = dplyr::case_when(
    grepl(&quot;^filter&quot;, scenario) ~ &quot;filter&quot;,
    grepl(&quot;^select&quot;, scenario) ~ &quot;select&quot;,
    grepl(&quot;^distinct&quot;, scenario) ~ &quot;distinct&quot;,
    TRUE ~ &quot;mutate&quot;
  ))

p_speedup &lt;- ggplot(
  speedup_plot_data,
  aes(x = speedup, y = reorder(scenario, speedup))
) +
  geom_col(width = 0.7, fill = &quot;grey70&quot;, color = &quot;grey40&quot;) +
  facet_grid(operation ~ ., scales = &quot;free_y&quot;, space = &quot;free_y&quot;) +
  labs(
    title = &quot;Median speedup by scenario&quot;,
    x = &quot;Speedup (before/after, x)&quot;,
    y = NULL
  ) +
  theme_bw() +
  theme(legend.position = &quot;none&quot;)

combined_plot &lt;- p_box + p_speedup + patchwork::plot_layout(widths = c(2.3, 1))
combined_plot</pre>
</details>
<div class="cell-output-display">
<div>
<figure class="figure">
<p><img src="https://i0.wp.com/tidyomics.github.io/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization/index_files/figure-html/plot-1.png?w=450&#038;ssl=1" class="img-fluid figure-img"  data-recalc-dims="1"></p>
</figure>
</div>
</div>
<details class="code-fold">
<summary>Show the code</summary>
<pre># Save the combined figure
ggsave(&quot;benchmark_plot.png&quot;, plot = combined_plot, width = 14, height = 8)</pre>
</details>
</div>
<section id="interpreting-the-benchmark-results" class="level3">
<h3 class="anchored" data-anchor-id="interpreting-the-benchmark-results">Interpreting the benchmark results</h3>
<p>Across all scenarios, speedup ranges from <strong>0.69x</strong> to <strong>26.17x</strong>.</p>
<p>Operations with the strongest gains are: <strong>coldata_arithmetic (26.17x), coldata_simple_assignment (25.71x), chained_mutates (25.45x)</strong>.</p>
<p>Lower-gain outliers are: <strong>distinct_coldata_wide (0.69x), coldata_grouped_mean (0.94x), distinct_mixed_keys_counts (0.94x)</strong>.</p>
<p>By operation family, median speedup is: <strong>mutate (23.66x), filter (1.19x), select (1.11x), distinct (1.08x)</strong>.</p>
</section>
</section>
<section id="session-info" class="level1">
<h1>Session Info</h1>
<div class="cell">
<div class="cell-output cell-output-stdout">
<pre>R version 4.5.3 (2026-03-11)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.6.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Australia/Adelaide
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] tidySummarizedExperiment_1.19.7 tidyr_1.3.2                    
 [3] testthat_3.3.2                  ttservice_0.5.3                
 [5] patchwork_1.3.2                 reactable_0.4.5                
 [7] rlang_1.1.7                     microbenchmark_1.5.0           
 [9] airway_1.30.0                   SummarizedExperiment_1.40.0    
[11] Biobase_2.70.0                  GenomicRanges_1.62.1           
[13] Seqinfo_1.0.0                   IRanges_2.44.0                 
[15] S4Vectors_0.48.0                BiocGenerics_0.56.0            
[17] generics_0.1.4                  MatrixGenerics_1.22.0          
[19] matrixStats_1.5.0               dplyr_1.2.0                    
[21] ggplot2_4.0.2                   devtools_2.4.6                 
[23] usethis_3.2.1                  

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1    viridisLite_0.4.3   farver_2.1.2       
 [4] S7_0.2.1            fastmap_1.2.0       lazyeval_0.2.2     
 [7] digest_0.6.39       plyxp_1.4.3         lifecycle_1.0.5    
[10] ellipsis_0.3.2      magrittr_2.0.4      compiler_4.5.3     
[13] tools_4.5.3         yaml_2.3.12         data.table_1.18.2.1
[16] knitr_1.51          S4Arrays_1.10.1     labeling_0.4.3     
[19] htmlwidgets_1.6.4   pkgbuild_1.4.8      DelayedArray_0.36.0
[22] RColorBrewer_1.1-3  pkgload_1.5.0       abind_1.4-8        
[25] withr_3.0.2         purrr_1.2.1         desc_1.4.3         
[28] grid_4.5.3          fansi_1.0.7         scales_1.4.0       
[31] cli_3.6.5           rmarkdown_2.30      ragg_1.5.1         
[34] remotes_2.5.0       otel_0.2.0          rstudioapi_0.18.0  
[37] httr_1.4.8          sessioninfo_1.2.3   cachem_1.1.0       
[40] stringr_1.6.0       XVector_0.50.0      vctrs_0.7.1        
[43] Matrix_1.7-4        jsonlite_2.0.0      systemfonts_1.3.2  
[46] crosstalk_1.2.2     plotly_4.12.0       glue_1.8.0         
[49] reactR_0.6.1        stringi_1.8.7       gtable_0.3.6       
[52] tibble_3.3.1        pillar_1.11.1       htmltools_0.5.9    
[55] brio_1.1.5          R6_2.6.1            textshaping_1.0.5  
[58] rprojroot_2.1.1     evaluate_1.0.5      lattice_0.22-9     
[61] memoise_2.0.1       SparseArray_1.10.9  xfun_0.56          
[64] fs_1.6.7            pkgconfig_2.0.3    </pre>
</div>
</div>


</section>

<p>
© 2025 tidyomics. Content is published under <a href="https://creativecommons.org/licenses/by/4.0/" rel="nofollow" target="_blank">Creative Commons CC-BY-4.0 License</a> for the text and <a href="https://opensource.org/licenses/BSD-3-Clause" rel="nofollow" target="_blank">BSD 3-Clause License</a> for any code. | <a href="https://www.r-bloggers.com/" rel="nofollow" target="_blank">R-Bloggers</a>
</p> 
<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://tidyomics.github.io/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization/"> tidyomicsBlog</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/speeding-up-tidysummarizedexperiment-through-query-optimisation-and-the-plyxp-backend/">Speeding up tidySummarizedExperiment through query optimisation and the plyxp backend</a>]]></content:encoded>
					
		
		<enclosure url="https://tidyomics.github.io/tidyomicsBlog/posts/2025-10-25-tidySummarizedExperiment-optimization/benchmark_plot.png" length="0" type="image/png" />

		<post-id xmlns="com-wordpress:feed-additions:1">400026</post-id>	</item>
		<item>
		<title>Pacific island demograpy, the narrative by @ellis2013nz</title>
		<link>https://www.r-bloggers.com/2026/03/pacific-island-demograpy-the-narrative-by-ellis2013nz/</link>
		
		<dc:creator><![CDATA[free range statistics - R]]></dc:creator>
		<pubDate>Sat, 21 Mar 2026 12:00:00 +0000</pubDate>
				<category><![CDATA[R bloggers]]></category>
		<guid isPermaLink="false">https://freerangestats.info/blog/2026/03/22/pacific-people-narrative</guid>

					<description><![CDATA[<div style = "width:60%; display: inline-block; float:left; "> This post is the last in a series of seven on population and people movement issues in the Pacific. The first six posts featured code re-generating the charts I used in a keynote speech before the November 2025 meeting of the Pacific Heads of Planning ...</div>
<div style = "width: 40%; display: inline-block; float:right;"></div>
<div style="clear: both;"></div>
<strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/pacific-island-demograpy-the-narrative-by-ellis2013nz/">Pacific island demograpy, the narrative by @ellis2013nz</a>]]></description>
										<content:encoded><![CDATA[<!-- 
<div style="min-height: 30px;">
[social4i size="small" align="align-left"]
</div>
-->

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 12px;">
[This article was first published on  <strong><a href="https://freerangestats.info/blog/2026/03/22/pacific-people-narrative"> free range statistics - R</a></strong>, and kindly contributed to <a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers</a>].  (You can report issue about the content on this page <a href="https://www.r-bloggers.com/contact-us/">here</a>)
<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div>
<p>This post is the last in a series of seven on population and people movement issues in the Pacific. The first six posts featured code re-generating the charts I used in a keynote speech before the November 2025 meeting of the Pacific Heads of Planning and Statistics in Wellington, New Zealand. Today’s post is simply a narrative drawing on all those charts. There’s no R code today; links to the previous posts are at the bottom.</p>

<h2 id="population-growth-is-varied-but-the-larger-pacific-island-countries-are-growing-pretty-fast">Population growth is varied, but the larger Pacific island countries are growing pretty fast</h2>

<p>The first point to make is that the Pacific is very varied in terms of its population dynamics. We can see this a bit in the most straightforward and intuitive chart of population historical growth and future projections:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0305-population-line.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0305-population-line.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>The data this comes from is the United Nations’ 2024 population projections, which are currently used as-is in the Pacific Data Hub. Some of these figures are known to be wrong, either because there have been censuses that reported too late to be taken into account (e.g. Federated States of Micronesia) or for other reasons (e.g. Tokelau). But they’re a good starting point.</p>

<p>The vertical scale on that chart is “free” meaning each country-facet is on a different scale, so it’s not easy to tell visually which countries are larger or smaller. To help with this, they are organised from smallest (Niue) to largest (Papua New Guinea). Pitcairn Islands, although a member of the Pacific Community, is not included because of its tiny size (around 50 people) even by Pacific standards.</p>

<p>A bit of familiarity with the region though, helped by the sequencing on the chart, identifies that the larger countries—those in the bottom row from Samoa to Papua New Guinea, up to say Kiribati and Guam in the row above—are growing faster as well as being larger. Well that’s hardly surprising is it; faster growing countries will of course become larger, so we’d expect these things to be related? Except that the causality is not all in one direction like that. At its extreme, there’s more to the fact that Niue is only one or two thousand people and Papua New Guinea is 10,000 times its size (about 11 million) than that Papua New Guinea has grown faster recently. Niue’s small physical land mass, and other characteristics, are one of the reasons it doesn’t grow as fast.</p>

<p>The next chart is an attempt to illustrate this further. First, consider a version where we just put size on the horizontal axis and growth on the vertical. Both axes are logarithmically transformed, or else the chart would be all blank space with Papua New Guinea out on the right and all other countries a cluster of dots on top of eachother on the left.</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0305-population-scatter.svg" width="450"><img src="https://i0.wp.com/freerangestats.info/img/0305-population-scatter.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>I quite like this chart, for audiences with enough numeracy to cope with a scatter plot and log axes. The pink shading makes clear which countries are shrinking, a point of definite interest. The colour-coding of points by sub-region is very useful, making clear how the Melanesian countries cluster together in the top right “large and growing fast” quadrant; most Polynesian countries (not all) are shrinking; and Micronesia has a real mix. The next version of this chart adds a particularly interesting element to this (in a slide show, you can move from one chart to the next and it looks like these circles appear by magic):</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0305-population-scatter-highlighted.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0305-population-scatter-highlighted.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>Those circles represent the countries or territories that have some kind of easy migration access to a larger, richer country. This includes (conveniently, mostly in sets of three):</p>

<ul>
  <li>three French territories (French Polynesia, Wallis and Futuna, New Caledonia);</li>
  <li>three Realm of New Zealand territories (Cook Islands, Tokelau, Niue);</li>
  <li>Pitcairn Island whose residents (mostly) have right of abode in the UK;</li>
  <li>three USA territories (Guam, Commonwealth of the Northern Mariana Islands, American Samoa)</li>
  <li>three independent countries with a Compact of Free Association with the USA (Palau, Marshall Islands, Federated States of Micronesia)</li>
</ul>

<p>Now we have an interesting feature. All of the countries with negative growth apart from Tonga and Tuvalu are highlighted this way. Perhaps in future we would include Tuvalu in the list above because it does now have special arrangements with Australia and New Zealand that allow a certain amount of people movement, most importantly the Falepili Mobility Pathway scheme that allows 280 permanent residence visas per year with Australia. But this is too recent to show up in the chart above.</p>

<p>Samoa and Tonga do not have automatic right of residence to New Zealand but they have strong cultural and historical ties, large communities already living there (more on this later), and the visa obstacles are largely surmountable.</p>

<p>In my view the distinction between the countries that have easy mobility to a larger richer countries, and those that don’t, is the most important single marker to use when considering population issues in the Pacific.</p>

<h2 id="net-migration-is-a-critical-factor-in-different-population-growth-rates">Net migration is a critical factor in different population growth rates</h2>

<p>OK, let’s look at net migration. The United Nations has to estimate this as part of its projection process so we can see it for all the countries in their dataset. In the below, I’ve organised countries from those least impacted by migration (top left) to most (bottom right).</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0309-all-picts-net-migration.svg" width="450"><img src="https://i0.wp.com/freerangestats.info/img/0309-all-picts-net-migration.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>There’s a few remarkable things here. One is that the chart is mostly red—migration is nearly all outwards. Another is that it leaps around a bit. We can suspect data problems for some of this eg in the case of Tokelau’s recent years, I just think we have an error (it’s being looked at).</p>

<p>As a way of drilling into the impact of migration on population, for just a subset of countries because it’s getting complicated, is comparing the natural rate of increase (ie births minus deaths) with the net migration—which between them, add up to total change in population for each country. Here’s a chart that does that, for just six countries:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0309-six-picts-natural-immmigration-line.svg" width="450"><img src="https://i2.wp.com/freerangestats.info/img/0309-six-picts-natural-immmigration-line.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>In each of these charts, we can start with the green dashed line, which is where natural increase would be with no migration. Then we add (or subtract, in most cases) the red dotted line which is net migration. The sum of these two is the blue solid line, total population change. The sort of things we see here are:</p>

<ul>
  <li>For Kiribati and Papua New Guinea, the solid blue line is fairly close to the dashed green line, indicating that natural increase is what is driving population change</li>
  <li>For Samoa and Marshall Islands, the green dashed natural increase line is very positive and the red dotted migration line is very negative. These two drivers mostly cancel eachother out, but when all is added up Marshall Islands has rapid recent population decline and Samoa still has some remnant slow population growth</li>
  <li>For Niue and Marianas, the story is more complicated but in recent years has stabilised at “not much change”</li>
</ul>

<p>What we’re seeing here is that migration—or the lack of it in significant numbers, for Kiribati and Papua New Guinea—is what is driving the population story.</p>

<h2 id="migration-short-and-long-term-changes-the-shape-of-the-origin-countrys-demographics">Migration (short and long term) changes the <i>shape</i> of the origin country’s demographics</h2>

<p>What does this mean for the structure of who is left? To illustrate this, I like to compare Kiribati and Marshall Islands. Both are entirely or mostly coral atolls; they are only around an hour’s flight from eachother; and they have few natural resources other than their people, the ocean and its fish, and location.</p>

<p>Location is a critical asset or curse for the Marshall Islands. Kwajalein Atoll in the Marshalls was a major Japanese base in World War II and the site of a bloody battle in 1944; now it is a key US base forming a bridging zone and operational depth between the so-called second (Guam, Palau, Saipan, etc.) and third (Aleutians, Hawaii, Samoa, New Zealand) <a href="https://en.wikipedia.org/wiki/Island_chain_strategy" rel="nofollow" target="_blank">island chains</a> in preparation for the next Pacific war, against whichever eastwards-facing Asian land power that might be. Bikini Atoll in the Marshalls was the site of US nuclear weapon testing during the cold war. For our purposes, all this matters because Marshall Islands has a Compact of Free Association with the USA which provides large amounts of funding plus free people movement to the USA. Even with recent crackdowns in the USA this persists, although dealing with Marshallese who have been forcibly returned from the USA due to low level criminal behaviour is becoming a policy challenge.</p>

<p>Kiribati has no such arrangement with its own large regional partner, Australia (former-British Kiribati uses the Australian dollar and drives on the left, just as Marshall Islands uses the US dollar and drives on the right hand side of the road).</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0314-kiribati-marshalls.svg" width="450"><img src="https://i1.wp.com/freerangestats.info/img/0314-kiribati-marshalls.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>To repeat some text from the blog post where I introduced this chart: Kiribati today has about four times the population of Marshall Islands but in 1980 was only about double. The significant thing here is the wasp waist of the Marshall Islands pyramid in 2025—while it had a similar shape to Kiribati in 1980. People at peak working and reproductive age are literally absent from today’s Marshall Islands—in this case, primarily in the USA.</p>

<p>The result of this is that Marshall Islands not only benefits from its individuals having more freedom of movement and opportunity, and sending back remittances from relatively high paying lives in the USA; but also having a pressure valve for what would otherwise be a rapidly (too fast?) growing population. To put it bluntly, Kiribati has a problem of too many people (particularly on crowded southern Tarawa); Marshall Islands, if it has a population problem, is one of too few. The contrast of crowded, relatively poor Tarawa and less-crowded, relatively well-off Majuro is an obvious and stark one to anyone travelling to them both in quick succession.</p>

<h2 id="pacific-people-are-overseas-in-very-considerable-numbers">Pacific people are overseas in very considerable numbers</h2>

<p>OK, so people have been moving from the Pacific islands to elsewhere for decades or longer. Proportionately speaking, does this matter? Have large numbers of Pacific islanders cumulatively ended up elsewhere? The following chart answers this with a resounding “yes”, for at least nine countries:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0313-diaspora-bar.svg" width="450"><img src="https://i0.wp.com/freerangestats.info/img/0313-diaspora-bar.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>For seven countries, there are more people ethnically associated with that country living in the USA, New Zealand and Australia than in the origin country. For the countries in the bottom row &#8211; the three New Zealand Realm countries plus Pitcairn &#8211; there are many multiples more people living overseas. Around 40,000 Niueans live overseas (mostly in New Zealand) and less than 2,000 in Niue. For Tonga, Samoa and Marshall Islands the situation is not as extreme but still very substantial.</p>

<p>Illustrating this further, consider this chart of the world’s largest Pacific Islander cities. Thanks to comments on LinkedIn I have improved this from previous versions—we now have a better estimate of people with Pacific Islander as one ‘race’ of several in Hawaii, and a more comparable definition of Greater Wellington. But what we see is broadly the same message as before—of the top ten Pacific Islander cities in the world, two are in Australia (Sydney and Brisbane) and one each in New Zealand (Auckland) and the USA (Honolulu).</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0312-pacific-cities-revised.svg" width="450"><img src="https://i2.wp.com/freerangestats.info/img/0312-pacific-cities-revised.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>All this has substantial implications for cultural and national identity, for economics and for politics. What political rights and expectations are there for these overseas people? For the home country, is there a minimum size that is viable? Or even a minimum proportion of your people? What happens legally, politically and diplomatically (for example, to votes in the United Nations General Assembly) if—as is clearly possible under climate change—a low-lying coral atoll country like Tokelau, Marshall Islands or Tuvalu loses all of its land to sea-level rise and <em>all</em> of its people are living overseas?</p>

<h2 id="remittances-are-a-critical-even-dominant-part-of-many-but-not-all-pacific-island-economies">Remittances are a critical, even dominant, part of many (but not all) Pacific island economies</h2>

<p>Without going to these extreme scenarios we have enormous economic implications in the here and now. One of the strongest indicators of this is the level of remittances. Remittances are payments from family or other contacts overseas, typically in a higher income country. The source of remittances can be people on relatively short trips overseas—in the Pacific, examples include people in the Pacific Australia Labour Mobility scheme or the New Zealand Recognised Seasonal Employer scheme—or from long term migrants who have made the other country their indefinite home.</p>

<p>We see from this, final, chart that some Pacific island countries have extraordinarily high levels of remittances compared to averages of comparable countries, including other small states:</p>

<object type="image/svg+xml" data="https://freerangestats.info/img/0315-remittances-bar.svg" width="450"><img src="https://i2.wp.com/freerangestats.info/img/0315-remittances-bar.png?w=450&#038;ssl=1" data-recalc-dims="1" /></object>

<p>Interestingly, the three highest countries on this measure are <em>not</em> on my list of countries with special access to permanent residency in a large rich country. But as previously mentioned, Samoa and Tonga have particularly strong ties to New Zealand that act partway to such special access. Vanuatu does not, but it is the beneficiary of short term labour schemes. It’s also possible that remittances are under-reported in some countries.</p>

<p>For countries like Tonga, it seems likely or at least possible that remittances are coming from long term migrants in New Zealand. After a generation or three, will they stop sending money back to Tonga? Possibly, but as we have seen there is a continual ongoing refreshment in the form of new migrants and it is likely the remittances will continue for the forseeable future.</p>

<p>One thing is clear—labour mobility is an enduring feature of the Pacific region that meets multiple groups’ needs and leaves a decisive mark on both sending and receiving economies. Remittances are just one most direct part of the economic impact; others include investment ties and human capability development. But I won’t go into a literature review on this area.</p>

<h2 id="there-are-important-policy-implications">There are important policy implications</h2>

<p>Why do we care? There are important implications of all this. The context of my original talk was a meeting bringing together heads of National Statistical Offices with heads of national planning, and I wanted to highlight population issues as one area where the links between official statistics and national planning and policy are (or should be) particularly strong. Some of the key economic and planning issues in this regard as I see them include:</p>

<ul>
  <li>Population projections (including at sub-national level) and the obvious implications for infrastructure and related planning (if and where to build roads, medical centres, schools, etc.)</li>
  <li>Impacts of migration (short and long term) on the working-age population back at home and what planning or policy levers are needed to deal with this, not just lament it</li>
  <li>Social impacts of many working-age people, perhaps disproportionately of one gender, being overseas</li>
  <li>Implications for taxation policy of a large proportion of national income coming in remittances</li>
  <li>Impact of migration (in either direction) on the net fiscal position (is it working age people coming to our country, paying more tax than they extract in benefits— or vice versa)</li>
  <li>Specifically, where do retirees end up, and who pays for them</li>
  <li>Impact of returnees from overseas work experiences—including raised skills and experiences, higher taxation payments, and general impact on national capabilities and capacity</li>
  <li>Impacts on equality back home from different types of people in the diaspora</li>
  <li>What labour market and cultural context should education be preparing young people for?</li>
</ul>

<p>This aims to be an indicative, rather than comprehensive, list. After all, my aim here is to highlight some issues, not venture into policy advice that I have no mandate for and which would require much more systematic evaluation of options and root causes of the problems.</p>

<h2 id="other-posts-in-this-series">Other posts in this series</h2>

<p>The seven blog posts in total in this series are set out below. The first six contain R code and data sources for each chart:</p>
<ul>
  <li><a href="https://freerangestats.info/blog/2025/11/30/pacific-population" rel="nofollow" target="_blank">Visual summaries of population size and growth</a></li>
  <li><a href="https://freerangestats.info/blog/2025/12/04/pacific-net-migration" rel="nofollow" target="_blank">Net migration</a></li>
  <li><a href="https://freerangestats.info/blog/2026/03/01/pacific-pyramids" rel="nofollow" target="_blank">Population pyramids</a></li>
  <li><a href="https://freerangestats.info/blog/2026/02/16/pacific-cities" rel="nofollow" target="_blank">World cities with the most Pacific Islanders</a></li>
  <li><a href="https://freerangestats.info/blog/2026/02/18/pacific-diaspora" rel="nofollow" target="_blank">Pacific diaspora</a></li>
  <li><a href="https://freerangestats.info/blog/2026/03/08/remittances" rel="nofollow" target="_blank">Remittances</a></li>
  <li>Tying it all together (this post today)</li>
</ul>

<div style="border: 1px solid; background: none repeat scroll 0 0 #EDEDED; margin: 1px; font-size: 13px;">
<div style="text-align: center;">To <strong>leave a comment</strong> for the author, please follow the link and comment on their blog: <strong><a href="https://freerangestats.info/blog/2026/03/22/pacific-people-narrative"> free range statistics - R</a></strong>.</div>
<hr />
<a href="https://www.r-bloggers.com/" rel="nofollow">R-bloggers.com</a> offers <strong><a href="https://feedburner.google.com/fb/a/mailverify?uri=RBloggers" rel="nofollow">daily e-mail updates</a></strong> about <a title="The R Project for Statistical Computing" href="https://www.r-project.org/" rel="nofollow">R</a> news and tutorials about <a title="R tutorials" href="https://www.r-bloggers.com/how-to-learn-r-2/" rel="nofollow">learning R</a> and many other topics. <a title="Data science jobs" href="https://www.r-users.com/" rel="nofollow">Click here if you're looking to post or find an R/data-science job</a>.

<hr>Want to share your content on R-bloggers?<a href="https://www.r-bloggers.com/add-your-blog/" rel="nofollow"> click here</a> if you have a blog, or <a href="http://r-posts.com/" rel="nofollow"> here</a> if you don't.
</div><strong>Continue reading</strong>: <a href="https://www.r-bloggers.com/2026/03/pacific-island-demograpy-the-narrative-by-ellis2013nz/">Pacific island demograpy, the narrative by @ellis2013nz</a>]]></content:encoded>
					
		
		<enclosure url="" length="0" type="" />

		<post-id xmlns="com-wordpress:feed-additions:1">400022</post-id>	</item>
	</channel>
</rss>

<!-- Dynamic page generated in 0.963 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2026-04-03 10:48:12 -->

<!-- Compression = gzip -->